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Preface 


The Research Data Curation and Management Bibliography includes over 800 
selected English-language articles and books that are useful in understanding the 
curation of digital research data in academic and other research institutions. 


The "digital curation" concept is still evolving. In "Digital Curation and Trusted 
Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo 
define digital curation as follows: 


Digital curation involves selection and appraisal by creators and archivists; 
evolving provision of intellectual access; redundant storage; data 
transformations; and, for some materials, a commitment to long-term 
preservation. Digital curation is stewardship that provides for the 
reproducibility and re-use of authentic digital data and other digital assets. 
Development of trustworthy and durable digital repositories; principles of 
sound metadata creation and capture; use of open standards for file formats 
and data encoding; and the promotion of information management literacy are 
all essential to the longevity of digital resources and the success of curation 
efforts.! 


The Research Data Curation and Management Bibliography covers topics such as 
research data creation, acquisition, metadata, provenance, repositories, 
management, policies, support services, funding agency requirements, open access, 
peer review, publication, citation, sharing, reuse, and preservation. It is highly 
selective in its coverage. 


The bibliography does not cover conference proceedings, digital media works 
(such as MP3 files), editorials, e-mail messages, interviews, letters to the editor, 
presentation slides or transcripts, technical reports. unpublished e-prints, or weblog 
postings. 


Most sources have been published from January 2009 through December 2019; 
however, a limited number of earlier key sources are also included. The 
bibliography has links to included works. URLs may alter without warning (or 
automatic forwarding) or they may disappear altogether. Where possible, this 
bibliography uses Digital Object Identifier System (DOI) URLs. DOIs are not 
rechecked after initial validation. Publisher systems may have temporary DOI 


resolution problems. Should a link be dead, try entering it in the Internet Archive 
Wayback Machine. 


Abstracts are included in this bibliography if a work is under a Creative Commons 
Attribution License (BY and national/international variations), a Creative 
Commons public domain dedication (CCO), or a Creative Commons Public 
Domain Mark and this is clearly indicated in the publisher’s current webpage for 
the article. Note that a publisher may have changed the licenses for all articles on a 
journal’s website but not have made corresponding license changes in journal’s 
PDF files. The license on the current webpage is deemed to be the correct one. 
Since publishers can change licenses in the future, the license indicated for a work 
in this bibliography may not be the one you find upon retrieval of the work. 


Unless otherwise noted, article abstracts in this bibliography are under a Creative 
Commons Attribution 4.0 International License, 
https://creativecommons.org/licenses/by/4.0/. Abstracts are reproduced as written 
in the source material. 


See the Creative Commons' Frequently Asked Questions for a discussion of how 
documents under different Creative Commons licenses can be combined. 
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Research data publishing is intended as the release of research data to make it 
possible for practitioners to (re)use them according to "open science" 
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a collaborative effort to build a community of practice of librarians tasked 
with addressing the research data needs of their campuses, describes how this 
effort was evaluated, and presents future opportunities. 
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Roundtables planning committee developed a low-cost professional 
development day divided into two parts: a morning session that detailed an 
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discussion on practical aspects of research data services. Evaluations from 
these events were coded in NVivo and we report on the common themes. 
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Results: Participants returned sixty-one evaluations from four events. Five 
themes emerged from the evaluations: learning, sharing, format, networking, 
and empathy. 


Conclusion: The events provide a valuable professional development 
experience for attendees, and the authors hope that by providing a description 
of the events' development, others will establish their own local communities 
of practice. 


Austin, Claire C., Theodora Bloom, Siinje Dallmeier-Tiessen, Varsha K. Khodiyar, 
Fiona Murphy, Amy Nurnberger, Lisa Raymond, Martina Stockhause, Jonathan 
Tedds, Mary Vardigan, and Angus Whyte. "Key Components of Data Publishing: 
Using Current Best Practices to Develop a Reference Model for Data Publishing." 
International Journal on Digital Libraries 18, no. 2 (2017): 77-92. 
https://doi.org/10.1007/s00799-016-0178-2 


Austin, Claire C., Susan Brown, Nancy Fong, Chuck Humphrey, Amber Leahey, 
and Peter Webster. "Research Data Repositories: Review of Current Features, Gap 
Analysis, and Recommendations for Minimum Requirements." JASSIST Quarterly 
39, no. 4 (2015): 24-38. https://doi.org/10.29173/1q904 


Aydinoglu, Arsev Umur, Dogan Guleda, and Taskin Zehra. "Research Data 
Management in Turkey: Perceptions and Practices." Library Hi Tech 35, no. 2 
(2017): 271-289. https://doi.org/10.1108/LHT-11-2016-0134 


Bache, Richard, Simon Miles, Bolaji Coker, and Adel Taweel. "Informative 
Provenance for Repurposed Data: A Case Study using Clinical Research Data." 
International Journal of Digital Curation 8, no. 2 (2013): 27-46. 
https://doi.org/10.2218/ijdc.v8i2.262 
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unintended research objectives is a non-trivial problem because the mappings 
required may not be precise. A particular case is clinical data collected for 
patient care being used for medical research. The fact that research 
repositories will record data differently means that assumptions must be made 
as how to transform of this data. Records of provenance that document how 
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is transformed using comparable assumptions. For a provenance-based 
approach to be reusable and supportable with software tools, the provenance 
records must use a well-defined model of the transformation process. In this 
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researchers in sharing research data at the university level. The outcomes of 
the survey will help the researchers to develop appropriate data literacy 
awareness programmes meant to stimulate growth in data sharing practices 
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Oceanography during the same years. I also find that methods of citing and 
referring to these data sets in scientific publications are highly inconsistent, 
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apparent instantly. Active use can arise in various ways, several of which are 
being investigated by the Collaboration for Research Enhancement by Active 
use of Metadata (CREAM) project, which was funded by Jisc as part of their 
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propelled by funding requirements, journal publishers, local campus policies, 
or community-driven expectations of more collaborative and interdisciplinary 
research environments. However, it is not well understood how researchers 
are addressing these expectations and whether they are transitioning from 
individualized practices to more thoughtful and potentially public approaches 
to data sharing that will enable reuse of their data. METHODS The University 
of Minnesota Libraries conducted a local opt-in study of data management 
plans (DMPs) included in funded National Science Foundation (NSF) grant 
proposals from January 2011 through June 2014. In order to understand the 
current data management and sharing practices of campus researchers, we 
solicited, coded, and analyzed 182 DMPs, accounting for 41% of the total 
number of plans available. RESULTS DMPs from seven colleges and 
academic units were included. The College of Science of Engineering 
accounted for 70% of the plans in our review. While 96% of DMPs 
mentioned data sharing, we found a variety of approaches for how PIs shared 
their data, where data was shared, the intended audiences for sharing, and 
practices for ensuring long-term reuse. CONCLUSION DMPs are useful tools 
to investigate researchers’ current plans and philosophies for how research 
outputs might be shared. Plans and strategies for data sharing are inconsistent 
across this sample, and researchers need to better understand what kind of 
sharing constitutes public access. More intervention is needed to ensure that 
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researchers implement the sharing provisions in their plans to the fullest 
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targeted data services for researchers that aim to increase the impact of 
institutional research. 
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For open science to flourish, data and any related digital outputs should be 
discoverable and re-usable by a variety of potential consumers. The recent 
FAIR Data Principles produced by the Future of Research Communication 
and e-Scholarship (FORCE11) collective provide a compilation of 
considerations for making data findable, accessible, interoperable, and re- 
usable. The principles serve as guideposts to 'good' data management and 
stewardship for data and/or metadata. On a conceptual level, the principles 
codify best practices that managers and stewards would find agreement with, 
exist in other data quality metrics, and already implement. This paper reports 
on a secondary purpose of the principles: to inform assessment of data's 
FAIR-ness or, put another way, data's fitness for use. Assessment of FAIR- 
ness likely requires more stratification across data types and among various 
consumer communities, as how data are found, accessed, interoperated, and 
re-used differs depending on types and purposes. This paper's purpose is to 
present a method for qualitatively measuring the FAIR Data Principles 
through operationalizing findability, accessibility, interoperability, and re- 
usability from a re-user's perspective. The findings may inform assessments 
that could also be used to develop situationally-relevant fitness for use 
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Secondary analysis of qualitative data entails reusing data created from 
previous research projects for new purposes. Reuse provides an opportunity to 
study the raw materials of past research projects to gain methodological and 
substantive insights. In the past decade, use of the approach has grown rapidly 
in the United Kingdom to become sufficiently accepted that it must now be 
regarded as mainstream. Several factors explain this growth: the open data 
movement, research funders’ and publishers’ policies supporting data sharing, 
and researchers seeing benefits from sharing resources, including data. 
Another factor enabling qualitative data reuse has been improved services and 
infrastructure that facilitate access to thousands of data collections. The UK 
Data Service is an example of a well-established facility; more recent has 
been the proliferation of repositories being established within universities. 
This article will provide evidence of the growth of data reuse in the United 
Kingdom and in Finland by presenting both data and case studies of reuse that 
illustrate the breadth and diversity of this maturing research method. We use 
two distinct data sources that quantify the scale, types, and trends of reuse of 
qualitative data: (a) downloads of archived data collections held at data 
repositories and (b) publication citations. Although the focus of this article is 
on the United Kingdom, some discussion of the international environment is 
provided, together with data and examples of reuse at the Finnish Social 
Science Data Archive. The conclusion summarizes the major findings, 
including some conjectures regarding what makes qualitative data attractive 
for reuse and sharing. 
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Efforts to make research results open and reproducible are increasingly 
reflected by journal policies encouraging or mandating authors to provide 
data availability statements. As a consequence of this, there has been a strong 
uptake of data availability statements in recent literature. Nevertheless, it is 
still unclear what proportion of these statements actually contain well-formed 
links to data, for example via a URL or permanent identifier, and if there is an 
added value in providing such links. We consider 531, 889 journal articles 
published by PLOS and BMC, develop an automatic system for labelling their 
data availability statements according to four categories based on their 
content and the type of data availability they display, and finally analyze the 
citation advantage of different statement categories via regression. We find 
that, following mandated publisher policies, data availability statements 
become very common. In 2018 93.7% of 21,793 PLOS articles and 88.2% of 
31,956 BMC articles had data availability statements. Data availability 
statements containing a link to data in a repository—rather than being 
available on request or included as supporting information files—are a 
fraction of the total. In 2017 and 2018, 20.8% of PLOS publications and 
12.2% of BMC publications provided DAS containing a link to data in a 
repository. We also find an association between articles that include 
statements that link to data in a repository and up to 25.36% (+ 1.07%) higher 
citation impact on average, using a citation prediction model. We discuss the 
potential implications of these results for authors (researchers) and journal 
publishers who make the effort of sharing their data in repositories. All our 
data and code are made available in order to reproduce and extend our results. 


Cole, Gareth John. "Establishing a Research Data Management Service at 
Loughborough University." International Journal of Digital Curation 11, no. 1 
(2016): 68-75. https://doi.org/10.2218/ijdc.v1 111.407 


In common with most UK universities Loughborough University needed to be 
compliant with the EPSRC Data Expectations by May 2015. This paper 
explains the process the University went through to meet these expectations. 
The paper also demonstrates how University senior management took the 
opportunity to look beyond compliance with EPSRC requirements. Project 
staff were challenged to identify a solution which would help to increase the 
University's research visibility and reach. The solution to all of these 
challenges is an innovative and ground-breaking relationship between the 
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University and three external partners. Investment has also been made in 
professional services staff to help manage and oversee the service. This paper 
explores the ways in which each element of Loughborough's research data 
service helps to reduce the burden on researchers, how much of the 
infrastructure is invisible to the research community, and how the service is 
being embedded in existing infrastructure and workflows. 


Collie, W. Aaron, and Michael Witt. "A Practice and Value Proposal for Doctoral 
Dissertation Data Curation." International Journal of Digital Curation 6, no. 2 
(2011): 165-175. https://doi.org/10.2218/ijdc.v6i2.194 


Collins, Ellen. "Use and Impact of UK Research Data Centres." International 
Journal of Digital Curation 6, no. 1 (2011): 20-31. 
https://doi.org/10.2218/1jdc.v6i1.169 


Conrad, Anders Sparre, Rasmus Handberg, and Michael Svendsen. "Reuse for 
Research: Curating Astrophysical Datasets for Future Researchers." International 
Journal of Digital Curation 12, no. 2 (2017): 37-46. 
https://doi.org/10.2218/ijdc.v1212.516 


"Our data are going to be valuable for science for the next 50 years, so please 
make sure you preserve them and keep them accessible for active research for 
at least that period." 


These were approximately the words used by the principal investigator of the 
Kepler Asteroseismic Science Consortium (KASC) when he presented our 
task to us. The data in question consists of data products produced by KASC 
researchers and working groups as part of their research, as well as underlying 
data imported from the NASA archives. 


The overall requirements for 50 years of preservation while, at the same time, 
enabling reuse of the data for active research presented a number of specific 
challenges, closely intertwining data handling and data infrastructure with 
scientific issues. This paper reports our work to deliver the best possible 
solution, performed in close cooperation between the research team and 
library personnel. 
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Network modelling provides a framework for the systematic analysis of needs 
and options for preservation. A number of general strategies can be identified, 
characterised and applied to many situations; these strategies may be 
combined to produce robust preservation solutions tailored to the needs of the 
community and responsive to their environment. This paper provides an 
overview of this approach. We describe the components of a Preservation 
Network Model and go on to show how it may be used to plan preservation 
actions according to the requirements of the particular situation using 
illustrative examples from scientific archives. 
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Much time and energy is now being devoted to developing the skills of 
researchers in the related areas of data analysis and data management. 
However, less attention is currently paid to developing the data skills of 
librarians themselves: these skills are often brought in by recruitment in niche 
areas rather than considered as a wider development need for the library 
workforce, and are not widely recognised as important to the professional 
career development of librarians. We believe that building computational and 
data science capacity within academic libraries will have direct benefits for 
both librarians and the users we serve. 


Library Carpentry is a global effort to provide training to librarians in 
technical areas that have traditionally been seen as the preserve of 
researchers, IT support and systems librarians. Established non-profit 
volunteer organisations, such as Software Carpentry and Data Carpentry, 


42 


offer introductory research software skills training with a focus on the needs 
and requirements of research scientists. Library Carpentry is a comparable 
introductory software skills training programme with a focus on the needs and 
requirements of library and information professionals. This paper describes 
how the material was developed and delivered, and reports on challenges 
faced, lessons learned and future plans. 
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Over the last years, many organizations have been working on infrastructure 
to facilitate sharing and reuse of research data. This means that researchers 
now have ways of making their data available, but not necessarily incentives 
to do so. Several Research Data Alliance (RDA) working groups have been 
working on ways to start measuring activities around research data to provide 
input for new Data Level Metrics (DLMs). These DLMs are a critical step 
towards providing researchers with credit for their work. In this paper, we 
describe the outcomes of the work of the Scholarly Link Exchange (Scholix) 
working group and the Data Usage Metrics working group. The Scholix 
working group developed a framework that allows organizations to expose 
and discover links between articles and datasets, thereby providing an 
indication of data citations. The Data Usage Metrics group works on a 
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standard for the measurement and display of Data Usage Metrics. Here we 
explain how publishers and data repositories can contribute to and benefit 
from these initiatives. Together, these contributions feed into several hubs 
that enable data repositories to start displaying DLMs. Once these DLMs are 
available, researchers are in a better position to make their data count and be 
rewarded for their work. 
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This article reports an international study of research data management 
(RDM) activities, services, and capabilities in higher education libraries. It 
presents the results of a survey covering higher education libraries in 
Australia, Canada, Germany, Ireland, the Netherlands, New Zealand, and the 
UK. The results indicate that libraries have provided leadership in RDM, 
particularly in advocacy and policy development. Service development is still 
limited, focused especially on advisory and consultancy services (such as data 
management planning support and data-related training), rather than technical 
services (such as provision of a data catalog, and curation of active data). 
Data curation skills development is underway in libraries, but skills and 
capabilities are not consistently in place and remain a concern. Other major 
challenges include resourcing, working with other support services, and 
achieving "buy in" from researchers and senior managers. Results are 
compared with previous studies in order to assess trends and relative maturity 
levels. The range of RDM activities explored in this study are positioned on a 
"landscape maturity model,” which reflects current and planned research data 
services and practice in academic libraries, representing a "snapshot" of 
current developments and a baseline for future research. 
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University libraries have played an important role in constructing an 
infrastructure of support for Research Data Management at an institutional 
level. This paper presents a comparative analysis of two international surveys 
of libraries about their involvement in Research Data Services conducted in 
2014 and 2018. The aim was to explore how services had developed over this 
time period, and to explore the drivers and barriers to change. In particular, 
there was an interest in how far the FAIR data principles had been adopted. 


Services in nearly every area were more developed in 2018 than before, but 
technical services remained less developed than advisory. Progress on 
institutional policy was also evident. However, priorities did not seem to have 
shifted significantly. Open ended answers suggested that funder policy, rather 
than researcher demand, remained the main driver of service development and 
that resources and skills gaps remained issues. While widely understood as an 
important reference point and standard, because of their relatively recent 
publication date, FAIR principles had not been widely adopted explicitly in 
policy. 
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The purpose of this paper is to explore the value to librarians of seeing 
research data management as a 'wicked' problem. Wicked problems are 
unique, complex problems which are defined differently by different 
stakeholders making them particularly intractable. Data from 26 semi- 
structured in-depth telephone interviews with librarians was analysed to see 
how far their perceptions of research data management aligned with the 16 
features of a wicked problem identified from the literature. To a large extent 
research data management is perceived to be wicked, though over time good 
practices may emerge to help to 'tame' the problem. How interviewees 
thought research data management should be approached reflected this 
realisation. The generic value of the concept of wicked problems is 
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considered and some first thoughts about how the curriculum for new entrants 
to the profession can prepare them for such problems are presented. 


This work is licensed under a Creative Commons Attribution 3.0 Unported 
License. https://creativecommons.org/licenses/by/3.0/ . 
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In this context, JISC have funded the White Rose consortium of academic 
libraries at Leeds, Sheffield and York, working closely with the Sheffield 
Information School, in the RDMRose Project (link is external), to develop 
learning materials that will help librarians grasp the opportunity that RDM 
offers. The learning materials will be used in the Information School's 
Masters courses, and are also to be made available to other information sector 
training providers on a share-alike licence. A version will also be made 
available (from January 2013) as an Open Educational Resource for use by 
information professionals who want to update their competencies as part of 
their continuing professional development (CPD). The learning materials are 
being developed specifically for liaison librarians, to upskill existing 
professionals and to expand the knowledge base for new entrants to 
librarianship. It is hoped to accommodate the perspectives of any information 
professional, but the scope is not intended to encompass a syllabus for a data 
management specialist role (following the distinction made by Corrall [1]). 


This article summarises current thinking developed within the project about 
the scope and level of such learning materials. This thinking is based on a 
number of sources: the literature and existing curricula and also the project 
vision and data collected during the project in focus groups with staff at the 
participating libraries. 
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Scientific workflows and their supporting systems are becoming increasingly 
popular for compute-intensive and data-intensive scientific experiments. The 
advantages scientific workflows offer include rapid and easy workflow 
design, software and data reuse, scalable execution, sharing and collaboration, 
and other advantages that altogether facilitate "reproducible science". In this 
context, provenance—information about the origin, context, derivation, 
ownership, or history of some artifact—plays a key role, since scientists are 
interested in examining and auditing the results of scientific experiments. 


However, in order to perform such analyses on scientific results as part of 
extended research collaborations, an adequate environment and tools are 
required. Concretely, the need arises for a repository that will facilitate the 
sharing of scientific workflows and their associated execution traces in an 
interoperable manner, also enabling querying and visualization. Furthermore, 
such functionality should be supported while taking performance and 
scalability into account. 


With this purpose in mind, we introduce PBase: a scientific workflow 
provenance repository implementing the ProvONE proposed standard, which 
extends the emerging W3C PROV standard for provenance data with 
workflow specific concepts. PBase is built on the Neo4j graph database, thus 
offering capabilities such as declarative and efficient querying. Our 
experiences demonstrate the power gained by supporting various types of 
queries for provenance data. In addition, PBase is equipped with a user 
friendly interface tailored for the visualization of scientific workflow 
provenance data, making the specification of queries and the interpretation of 
their results easier and more effective. 
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The implementation of a scientific research data management system is an 
important task within long-term, interdisciplinary research projects. Besides 
sustainable storage of data, including accurate descriptions with metadata, 
easy and secure exchange and provision of data is necessary, as well as 
backup and visualisation. The design of such a system poses challenges and 
problems that need to be solved. 


This paper describes the practical experiences gained by the implementation 
of a scientific research data management system, established in a large, 
interdisciplinary research project with focus on Soil-Vegetation-Atmosphere 
Data. 


Curty, Renata Gongalves. "Factors Influencing Research Data Reuse in the Social 
Sciences: An Exploratory Study." [International Journal of Digital Curation 11, no. 
1 (2016): 96-117. https://do1.org/10.2218/yjdc.v1111.401 


The development of e-Research infrastructure has enabled data to be shared 
and accessed more openly. Policy mandates for data sharing have contributed 
to the increasing availability of research data through data repositories, which 
create favourable conditions for the re-use of data for purposes not always 
anticipated by original collectors. Despite the current efforts to promote 
transparency and reproducibility in science, data re-use cannot be assumed, 
nor merely considered a 'thrifting' activity where scientists shop around in 
data repositories considering only the ease of access to data. The lack of an 
integrated view of individual, social and technological influential factors to 
intentional and actual data re-use behaviour was the key motivator for this 
study. Interviews with 13 social scientists produced 25 factors that were 
found to influence their perceptions and experiences, including both their 
unsuccessful and successful attempts to re-use data. These factors were 
grouped into six theoretical variables: perceived benefits, perceived risks, 
perceived effort, social influence, facilitating conditions, and perceived re- 
usability. These research findings provide an in-depth understanding about 
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the re-use of research data in the context of open science, which can be 
valuable in terms of theory and practice to help leverage data re-use and make 
publicly available data more actionable. 
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The value of sharing scientific research data is widely appreciated, but factors 
that hinder or prompt the reuse of data remain poorly understood. Using the 
Theory of Reasoned Action, we test the relationship between the beliefs and 
attitudes of scientists towards data reuse, and their self-reported data reuse 
behaviour. To do so, we used existing responses to selected questions from a 
worldwide survey of scientists developed and administered by the DataONE 
Usability and Assessment Working Group (thus practicing data reuse 
ourselves). Results show that the perceived efficacy and efficiency of data 
reuse are strong predictors of reuse behaviour, and that the perceived 
importance of data reuse corresponds to greater reuse. Expressed lack of trust 
in existing data and perceived norms against data reuse were not found to be 
major impediments for reuse contrary to our expectations. We found that 
reported use of models and remotely-sensed data was associated with greater 
reuse. The results suggest that data reuse would be encouraged and 
normalized by demonstration of its value. We offer some theoretical and 
practical suggestions that could help to legitimize investment and policies in 
favor of data sharing. 
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The data curation community has long encouraged researchers to document 
collected research data during active stages of the research workflow, to 
provide robust metadata earlier, and support research data publication and 
preservation. Data documentation with robust metadata is one of a number of 
steps in effective data publication. Data publication is the process of making 
digital research objects 'FAIR’, i.e. findable, accessible, interoperable, and 
reusable; attributes increasingly expected by research communities, funders 
and society. Research data publishing workflows are the means to that end. 
Currently, however, much published research data remains inconsistently and 
inadequately documented by researchers. Documentation of data closer in 
time to data collection would help mitigate the high cost that repositories 
associate with the ingest process. More effective data publication and sharing 
should in principle result from early interactions between researchers and 
their selected data repository. This paper describes a short study undertaken 
by members of the Research Data Alliance (RDA) and World Data System 
(WDS) working group on Publishing Data Workflows. We present a 
collection of recent examples of data publication workflows that connect data 
repositories and publishing platforms with research activity ‘upstream’ of the 
ingest process. We re-articulate previous recommendations of the working 
group, to account for the varied upstream service components and platforms 
that support the flow of contextual and provenance information downstream. 
These workflows should be open and loosely coupled to support 
interoperability, including with preservation and publication environments. 
Our recommendations aim to stimulate further work on researchers' views of 
data publishing and the extent to which available services and infrastructure 
facilitate the publication of FAIR data. We also aim to stimulate further 
dialogue about, and definition of, the roles and responsibilities of research 
data services and platform providers for the 'FAIRness' of research data 
publication workflows themselves. 
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INTRODUCTION Research Data Management (RDM) offers opportunities 
and challenges at the interface of library support and researcher needs. 
Libraries are in a position of balancing the capacity to provide support at the 
point of need while also implementing training for subject liaison librarians 
grounded in the practical issues and realities facing researchers and their 
institutions. DESCRIPTION OF PROGRAM/SERVICE The North Carolina 
State University (NCSU) Libraries has deployed a Data Management Plan 
(DMP) Review service managed by a committee of librarians with diverse 
experience in data management and domain expertise. By rotating librarians 
through membership on the committee and by inviting subject liaisons 
librarians to participate in the DMP Review process, our training ground 
model aims to develop needed competencies and support researchers through 
relevant services and partnerships. AUDIT OF PROGRAM/SERVICE This 
article presents an audit of the DMP Review service as a training ground to 
develop and enhance competencies as identified by the Joint Task Force on 
Librarians' Competencies in Support of E-Research and Scholarly 
Communication. NEXT STEPS AND CONCLUSIONS The DMP Review 
service creates opportunities for librarians to learn valuable skills while 
simultaneously providing a time-sensitive service to researchers. The process 
of auditing competencies developed by participating in the DMP Review 
service highlights gaps needed to more fully support RDM and reinforces the 
capacity of the DMP Review service as a training ground to sustain and 
iterate learning opportunities for librarians engaged in research support and 
partnerships. 
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their organization. For the implementation of this data management policy, 
high quality support for researchers and an adequate technical infrastructure 
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researchers. 
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The Data Seal of Approval (DSA) is one of the most widely used standards 
for Trusted Digital Repositories to date. Those who developed this standard 
have articulated seven main benefits of acquiring DSAs: 1) stakeholder 
confidence, 2) improvements in communication, 3) improvement in 
processes, 4) transparency, 5) differentiation from others, 6) awareness 
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data collected at Oak Ridge National Laboratory's neutron sources. Operation 
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who create them. The recent emphasis on open data is shifting the focus to 
ensure that the data produced are reusable by others. This mixed methods 
research study included a series of surveys and focus group interviews in 
which 13 data consumers, data managers, and data producers answered 
questions about their perspectives on sharing neutron data. Data consumers 
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The increasing use of geospatial data and related electronic records presents 
new challenges for these organizations, which have relied on traditional 
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As public investment in archiving research data grows, there has been 
increasing attention to the longevity or sustainability of the data repositories 
that curate such data. While there have been many conceptual frameworks 
developed and case reports of individual archives and digital repositories, 
there have been few empirical studies of how such archives persist over time. 
In this paper, we draw upon organizational studies theories to approach the 
issue of sustainability from an organizational perspective, focusing 
specifically on the organizational histories of three social science data 
archives (SSDA): ICPSR, UKDA, and LIS. Using a framework of 
organizational resilience to understand how archives perceive crisis, respond 
to it, and learn from experience, this article reports on an empirical study of 
sustainability in these long-lived SSDAs. The study draws from archival 
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(https://databank.illinois.edu/), to provide Illinois researchers with a free, self- 
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and reliable access to Illinois research data. This article presents a holistic 
view of development by discussing our overarching technical, policy, and 
interface strategies. By openly presenting our design decisions, the rationales 
behind those decisions, and associated challenges this paper aims to 
contribute to the library community's work to develop repository services that 
meet growing data preservation and sharing needs. 
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Few would question the need to archive the scientific and technical (S&T) 
data generated by researchers. At a minimum, the data are needed for change 
analysis. Likewise, most people would value efforts to ensure the preservation 
of the archived S&T data. Future generations will use analysis techniques not 
even considered today. Until recently, archiving and preserving these data 
were usually accomplished within existing infrastructures and budgets. As the 
volume of archived data increases, however, organizations charged with 
archiving S&T data will be increasingly challenged (U.S. General Accounting 
Office, 2002). The U.S. Geological Survey has had experience in this area 
and has developed strategies to deal with the mountain of land remote sensing 
data currently being managed and the tidal wave of expected new data. The 
Agency has dealt with archiving issues, such as selection criteria, purging, 
advisory panels, and data access, and has met with preservation challenges 
involving photographic and digital media. That experience has allowed the 
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This paper details how the U.S. Geological Survey (USGS) Community for 
Data Integration (CDI) Data Management Working Group developed a 
Science Data Lifecycle Model, and the role the Model plays in shaping 
agency-wide policies and data management applications. Starting with an 
extensive literature review of existing data lifecycle models, representatives 
from various backgrounds in USGS attended a two-day meeting where the 
basic elements for the Science Data Lifecycle Model were determined. 
Refinements and reviews spanned two years, leading to finalization of the 
model and documentation in a formal agency publication!. 


The Model serves as a critical framework for data management policy, 
instructional resources, and tools. The Model helps the USGS address both 
the Office of Science and Technology Policy (OSTP)? for increased public 
access to federally funded research, and the Office of Management and 
Budget (OMB)? 2013 Open Data directives, as the foundation for a series of 
agency policies related to data management planning, metadata development, 
data release procedures, and the long-term preservation of data. Additionally, 
the agency website devoted to data management instruction and best practices 
(www?2.usgs.gov/datamanagement) is designed around the Model's structure 
and concepts. This paper also illustrates how the Model is being used to 
develop tools for supporting USGS research and data management processes. 
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This study explores how researchers at a major Midwestern university are 
managing their data, as well as the factors that have shaped their practices and 
those that motivate or inhibit changes to that practice. A combination of 
survey (n=363) and interview data (n=15) yielded both qualitative and 
quantitative results bearing on my central research question: In what types of 
data management activities do researchers at this institution engage? 
Corollary to that, I also explored the following questions: What do 
researchers feel could be improved about their data management practices? 
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Which services might be of interest to them? How do they feel those services 
could most effectively be implemented? 


In this paper, I situate researchers' data management practices within a theory 
of personal information management. I present a view of data management 
and preservation needs from researchers’ perspectives across a range of 
domains. Additionally, I discuss the implications that understanding research 
data management as personal information management has for introducing 
services to support and improve data management practice. 
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Despite widespread support from policy makers, funding agencies, and 
scientific journals, academic researchers rarely make their research data 
available to others. At the same time, data sharing in research is attributed a 
vast potential for scientific progress. It allows the reproducibility of study 
results and the reuse of old data for new research questions. Based on a 
systematic review of 98 scholarly papers and an empirical survey among 603 
secondary data users, we develop a conceptual framework that explains the 
process of data sharing from the primary researcher's point of view. We show 
that this process can be divided into six descriptive categories: Data donor, 
research organization, research community, norms, data infrastructure, and 
data recipients. Drawing from our findings, we discuss theoretical 
implications regarding knowledge creation and dissemination as well as 
research policy measures to foster academic collaboration. We conclude that 
research data cannot be regarded as knowledge commons, but research 
policies that better incentivise data sharing are needed to improve the quality 
of research results and foster scientific progress. 
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In eScience, where vast data collections are processed in scientific workflows, 
new risks and challenges are emerging. Those challenges are changing the 
eScience paradigm, mainly regarding digital preservation and scientific 
workflows. To address specific concerns with data management in these 
scenarios, the concept of the Data Management Plan was established, serving 
as a tool for enabling digital preservation in eScience research projects. We 
claim risk management can be jointly used with a Data Management Plan, so 
new risks and challenges can be easily tackled. Therefore, we propose an 
analysis process for eScience projects using a Data Management Plan and 
ISO 31000 in order to create a Risk Management Plan that can complement 
the Data Management Plan. The motivation, requirements and validation of 
this proposal are explored in the MetaGen-FRAME project, focused in 
Metagenomics. 
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Initiatives for sharing research data are opportunities to increase the pace of 
knowledge discovery and scientific progress. The reuse of research data has 
the potential to avoid the duplication of data sets and to bring new views from 
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multiple analysis of the same data set. For example, the study of genomic 
variations associated with cancer profits from the universal collection of such 
data and helps in selecting the most appropriate therapy for a specific patient. 
However, data sharing poses challenges to the scientific community. These 
challenges are of ethical, cultural, legal, financial, or technical nature. This 
article reviews the impact that data sharing has in science and society and 
presents guidelines to improve the efficient sharing of research data. 
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Scientific data management is performed to ensure that data are curated in a 
manner that supports their qualified reuse. Curation usually involves actions 
that must be performed by those who capture or generate data and by a 
facility with the capability to sustainably archive and publish data beyond an 
individual project's lifecycle. The Australian Antarctic Data Centre is such a 
facility. How this centre is approaching the administration of Antarctic 
science data is described in the following paper and serves to demonstrate key 
facets necessary for undertaking polar data management in an increasingly 
connected global data environment. 
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There is significant friction in the acquisition, sharing, and reuse of research 
data. It is estimated that eighty percent of data analysis is invested in the 
cleaning and mapping of data (Dasu and Johnson, 2003). This friction 
hampers researchers not well versed in data preparation techniques from 
reusing an ever-increasing amount of data available within research data 
repositories. Frictionless Data is an ongoing project at Open Knowledge 
International focused on removing this friction. We are doing this by 
developing a set of tools, specifications, and best practices for describing, 
publishing, and validating data. The heart of this project is the "Data 
Package", a containerization format for data based on existing practices for 
publishing open source software. This paper will report on current progress 
toward that goal. 
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combinatorial chemistry, and the Grid. CombeChem instigated a range of 
activities that have since been underway for more than ten years, in many 
ways matching the expansion of interest in using the Web as a vehicle for 
collection, curation, dissemination, reuse, and exploitation of scientific data 
and information. Chemistry has frequently provided the exemplar case 
studies, notably for the series of projects—funded by Jisc and EPSRC—that 


68 


investigated the issues associated with the long-term preservation of data to 
support the scholarly knowledge cycle, such as the eBank UK project. 


Rapid developments in Internet access and mobile technology have 
significantly influenced the way researchers view connectivity, data 
standards, and the increasing importance and power of semantics and the 
Semantic Web. These technical advances interact strongly with the social 
dimension and have led to a reconsideration of the responsibilities of 
researchers for the quality of their research and for satisfying the requirements 
of modern stakeholders. Such obligations have given rise to discussions about 
Open Access and Open Data, creating a range of alternatives that are now 
technically feasible but need to be socially acceptable. Business plans are 
changing too, but in a strange contradiction, desire can run ahead of what is 
possible, sensible, and affordable, while lagging behind in imagination of 
what would be technically possible and potentially game-changing! 


Taking the chemical sciences as our example and focusing on the curation of 
research data, we explore from our perspective, ten years back and ten years 
forward, how far we have been able to re-imagine the data/information value 
pathway from bench to publication. We assess not only the major advances 
and changes that have been achieved, but also where we have been less 
successful than we might have hoped. We explore the directions for the 
future, based on what is clearly already possible and on what we can envisage 
becoming feasible in the near future. 
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The Polar Data Catalogue (PDC) is a growing Canadian archive and public 
access portal for Arctic and Antarctic research and monitoring data. In 
partnership with a variety of Canadian and international multi-sector research 
programs, the PDC encompasses the natural, social, and health sciences. 
From its inception, the PDC has adopted international standards and best 
practices to provide a robust infrastructure for reliable security, storage, 
discoverability, and access to Canada's polar data and metadata. Current 
efforts focus on developing new partnerships and incentives for data 
archiving and sharing and on expanding connections to other data centres 
through metadata interoperability protocols. 
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The majority of information about science, culture, society, economy and the 
environment is born digital, yet the underlying technology is subject to rapid 
obsolescence. One solution to this obsolescence, format migration, is widely 
practiced and supported by many software packages, yet migration has well 
known risks. For example, newer formats—even where similar in function— 
do not generally support all of the features of their predecessors, and, where 
similar features exist, there may be significant differences of interpretation. 


There appears to be a conflict between the wide use of migration and its 
known risks. In this paper we explore a simple hypothesis—that, where 
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information about potential migration mismatches and, using custom tools, 
evaluate a large collection of data files for the incidence of these risks. Our 
results support our initial hypothesis, though with some caveats. Further, we 
found that writing a tool to identify "risky" format features is considerably 
easier than writing a migration tool. 
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Research data is increasingly perceived as a valuable resource and, with 
appropriate curation and preservation, it has much to offer learning, teaching, 
research, knowledge transfer and consultancy activities in the visual arts. 
However, very little is known about the curation and preservation of this data: 
none of the specialist arts institutions have research data management policies 
or infrastructure and anecdotal evidence suggests that practice is ad hoc, left 
to individual researchers and teams with little support or guidance. In 
addition, the curation and preservation of such diverse and complex digital 
resources as found in the visual arts is, in itself, challenging. Led by the 
Visual Arts Data Service, a research centre of the University for the Creative 
Arts, in collaboration with the Glasgow School of Art; Goldsmiths College, 
University of London; and University of the Arts London, and funded by 
JISC, the KAPTUR project (2011-2013) seeks to address the lack of 
awareness and explore the potential of research data management systems in 
the arts by discovering the nature of research data in the visual arts, 
investigating the current state of research data management, developing a 
model of best practice applicable to both specialist arts institutions and arts 
departments in multidisciplinary institutions, and by applying, testing and 
piloting the model with the four institutional partners. Utilising the findings of 
the KAPTUR user requirement and technical review, this paper will outline 
the method and selection of an appropriate research data management system 
for the visual arts and the issues the team encountered along the way. 
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Over a 12-year period, the Atlantic Philanthropies invested more than €127m 
in agencies and community groups, running 52 prevention and early 
intervention (PEI) programmes and services in the children and youth sector 
throughout Ireland. As a condition of this funding, each PEI programme was 
evaluated by a university-based research team, resulting in a substantial 
collection of metric and qualitative information about ways to improve the 
lives of vulnerable Irish families. In 2016, the Atlantic Philanthropies funded 
the Prevention and Early Intervention Research Initiative at the Children's 
Research Network of Ireland and Northern Ireland (hereafter, the Initiative) to 
gather, prepare and share this evaluation data through the public data 
archives. 


The Initiative faces several challenges in its objective to archive this extensive 
collection of legacy data, and this paper will present two of the more salient 
challenges: how to share this data so that it is both (1) meaningful and (2) 
ethical. The paper pays particular attention to the challenges of safely sharing 
evaluation data through anonymisation and restricted access conditions; and 
also, the practical and ethical challenges of retroactively preparing these 
datasets for the archive. 


A series of publicly available documents that guide each stage of the Initiative 
are in development, and are emerging as a key output. This paper will 
describe two pivotal documents, namely the CRN-PEI Guiding Principles, 
and the CRN-PEI Protocols for preparing and archiving evaluation data. The 
CRN-PEI Guiding Principles outline the key legal and ethical obligations of 
archiving this legacy evaluation data, and act as moral compass to steer our 
progress through these uncharted waters. The CRN-PEI Protocols define the 
standards for how data included in the Initiative is prepared for deposition in 
the public data archives, so they are easily located, interpretable and 
comparable in the long term. This protocol is based upon best practice 
documentation from a number of international sources and our primary aim is 
to generate 'safe, useful data’ (Elliot at al., 2016). 
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DMPonline is a web-based tool to help researchers and research support staff 
produce data management and sharing plans. Between October and December 
2012, we examined DMPonline in unprecedented detail. The results of this 
evaluation led to some major changes. We have shortened the DCC Checklist 
for a Data Management Plan and revised how this is used in the tool. We have 
also amended the data model for DMPonline, improved workflows and 
redesigned the user interface. 


This paper reports on the evaluation, outlining the methods used, the results 
gathered and how they have been acted upon. We conducted usability testing 
on v.3 of DMPonline and the v.4 beta prior to release. The results from these 
two rounds of usability testing are compared to validate the changes made. 
We also put forward future plans for a more iterative development approach 
and greater community input. 
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Academic libraries have a critical role to play as data quality hubs on campus. 
There is an increased need to ensure data quality within 'e-science’. Given 
academic libraries’ curation and preservation expertise, libraries are well 
suited to support the data quality process. Data quality measurements are 
discussed, including the fundamental elements of trust, authenticity, 
understandability, usability and integrity, and are applied to the Digital 
Curation Lifecycle model to demonstrate how these measures can be used to 
understand and evaluate data quality within the curatorial process. 
Opportunities for improvement and challenges are identified as areas that are 
fruitful for future research and exploration. 
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From January 2014, Psychological Science introduced new submission 
guidelines that encouraged the use of effect sizes, estimation, and meta- 
analysis (the "new statistics"), required extra detail of methods, and offered 
badges for use of open science practices. We investigated the use of these 
practices in empirical articles published by Psychological Science and, for 
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comparison, by the Journal of Experimental Psychology: General, during the 
period of January 2013 to December 2015. The use of null hypothesis 
significance testing (NHST) was extremely high at all times and in both 
journals. In Psychological Science, the use of confidence intervals increased 
markedly overall, from 28% of articles in 2013 to 70% in 2015, as did the 
availability of open data (3 to 39%) and open materials (7 to 31%). The other 
journal showed smaller or much smaller changes. Our findings suggest that 
journal-specific submission guidelines may encourage desirable changes in 
authors’ practices. 
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While interest in research data management (RDM) services have grown, 
clarifying the path between traditional library responsibilities and RDM 
remains a challenge. While the literature has provided ideas about services 
and student-/researcher-focused data information literacy (DIL) 
competencies, nothing has yet brought these skill sets together to provide a 
pathway for librarians engaging in RDM. The Data Engagement 
Opportunities scaffold was developed to provide a strategic trajectory relating 
information science skills, the DIL competencies, the stages of the data life 
cycle, three levels of RDM engagement activities, and potential measurable 
outcomes. This scaffold provides direction for librarians looking to identify 
their current abilities and explore new opportunities. 
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As data as a scholarly object continues to grow in importance in the research 
community, librarians are undertaking increasing responsibilities regarding 
data management and curation. New library initiatives include assisting 
researchers in finding data sets for reuse; locating and hosting repositories for 
required archiving; consultations on workflow, data management plans, and 
best practices; responding to changing funder policies (Whitmire, et al. 2015) 
and development of department or institutional policies. Librarians looking to 
provide services or expand into these areas will need both foundational 
resources and information about engaging the network of librarians exploring 
data. This webliography is intended for librarians seeking to enhance their 
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own knowledge and assist peers in improving their data management 
awareness. 


Goben, Abigail, and Dorothea Salo. "Federal Research Data Requirements Set to 
Change." College & Research Libraries News 74, no. 8 (2013): 421-425. 
https://doi.org/10.5860/cr1n.74.8.8994 


Goldman, Julie, Donna Kafel, and Elaine R. Martin. "Assessment of Data 
Management Services at New England Region Resource Libraries." Journal of 
eScience Librarianship 4, no. 1 (2015): e1068. 
https://doi.org/10.7191/jeslib.2015.1068 


Goldstein, Justin C., Matthew S. Mayernik, and Hampapuram K. Ramapriyan. 
"Identifiers for Earth Science Data Sets: Where We Have Been and Where We 
Need to Go." Data Science Journal 16, no. 23 (2017). http://doi.org/10.5334/ds}j- 
2017-023 


Considerable attention has been devoted to the use of persistent identifiers for 
assets of interest to scientific and other communities alike over the last two 
decades. Among persistent identifiers, Digital Object Identifiers (DOIs) stand 
out quite prominently, with approximately 133 million DOIs assigned to 
various objects as of February 2017. While the assignment of DOIs to objects 
such as scientific publications has been in place for many years, their 
assignment to Earth science data sets is more recent. Applying persistent 
identifiers to data sets enables improved tracking of their use and reuse, 
facilitates the crediting of data producers, and aids reproducibility through 
associating research with the exact data set(s) used. Maintaining provenance 
—1.e., tracing back lineage of significant scientific conclusions to the entities 
(data sets, algorithms, instruments, satellites, etc.) that lead to the conclusions, 
would be prohibitive without persistent identifiers. This paper provides a brief 
background on the use of persistent identifiers in general within the US, and 
DOIs more specifically. We examine their recent use for Earth science data 
sets, and outline successes and some remaining challenges. Among the 
challenges, for example, is the ability to conveniently and consistently obtain 
data citation statistics using the DOIs assigned by organizations that manage 
data sets. 
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Feeding of Scientific Data." PLoS Computational Biology 10. no. 4 (2014): 
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This article offers a short guide to the steps scientists can take to ensure that 
their data and associated analyses continue to be of value and to be 
recognized. In just the past few years, hundreds of scholarly papers and 
reports have been written on questions of data sharing, data provenance, 
research reproducibility, licensing, attribution, privacy, and more—but our 
goal here is not to review that literature. Instead, we present a short guide 
intended for researchers who want to know why it is important to "care for 
and feed" data, with some practical advice on how to do that. The final 
section at the close of this work (Links to Useful Resources) offers links to 
the types of services referred to throughout the text. 
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(2015): eP1238. http://doi.org/10.7710/2162-3309.1238 


INTRODUCTION New interest has arisen in organizing, preserving, and 
sharing the raw materials-the data and metadata-that undergird the published 
products of research. Library and information scientists have valuable 
expertise to bring to bear in the effort to create larger, more diverse, and more 
widely used data repositories. However, for libraries to be maximally 
successful in providing the research data management and preservation 
services required of a successful data repository, librarians must work closely 
with researchers and learn about their data management workflows. 
DESCRIPTION OF SERVICES Databrary is a data repository that is closely 
linked to the needs of a specific scholarly community-researchers who use 
video as a main source of data to study child development and learning. The 
project's success to date is a result of its focus on community outreach and 
providing services for scholarly communication, engaging institutional 
partners, offering services for data curation with the guidance of closely 
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involved information professionals, and the creation of a strong technical 
infrastructure. NEXT STEPS Databrary plans to improve its curation tools 
that allow researchers to deposit their own data, enhance the user-facing 
feature set, increase integration with library systems, and implement strategies 
for long-term sustainability. 
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Over the last twenty years, a wide variety of resources have been developed 
to address the rights and licensing problems inherent with contemporary data 
sharing practices. The landscape of developments is this area is increasingly 
confusing and difficult to navigate, due to the complexity of intellectual 
property and ethics issues associated with sharing sensitive data. This paper 
seeks to address this challenge, examining the landscape and presenting a 
Version 1.0 directory of resources. A multi-method study was pursued, with 
an environmental scan examining 20 resources, resulting in three high-level 
categories: standards, tools, and community initiatives; and a content analysis 
revealing the subcategories of rights, licensing, metadata & ontologies. A 
timeline confirms a shift in licensing standardization priorities from open data 
to more nuanced and technologically robust solutions, over time, to 
accommodate for more sensitive data types. This paper reports on the 
research undertaking, and comments on the potential for using license- 
specific metadata supplements and developing data-centric rights and 
licensing ontologies. 
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This article describes the adoption of a standard policy for the inclusion of 
data availability statements in all research articles published at the Nature 
family of journals, and the subsequent research which assessed the impacts 
that these policies had on authors, editors, and the availability of datasets. The 
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key findings of this research project include the determination of average and 
median times required to add a data availability statement to an article; and a 
correlation between the way researchers make their data available, and the 
time required to add a data availability statement. 
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Throughout history, the life sciences have been revolutionised by 
technological advances; in our era this is manifested by advances in 
instrumentation for data generation, and consequently researchers now 
routinely handle large amounts of heterogeneous data in digital formats. The 
simultaneous transitions towards biology as a data science and towards a ‘life 
cycle' view of research data pose new challenges. Researchers face a 
bewildering landscape of data management requirements, recommendations 
and regulations, without necessarily being able to access data management 
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assist in data management in their particular research domain. Here we 
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collaboration increased expectations that the Library will expand existing 
research data services to more investigators, so we have grown Library 
professionals’ internal competencies by providing research data management 
training opportunities to meet these demands. In addition, the Library's 
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INTRODUCTION As more and more research data becomes better and more 
easily available, data citation gains in importance. The management of 
research data has been high on the agenda in academia for more than five 
years. Nevertheless, not all data policies include data citation, and problems 
like versioning and granularity remain. SERVICE DESCRIPTION dalra 
operates as an allocation agency for DataCite and offers the registration 
service for social and economic research data in Germany. The service is 
jointly run by GESIS and ZBW, thereby merging experiences on the fields of 
Social Sciences and Economics. The authors answer questions pertaining to 
the most frequent aspects of research data registration like versioning and 
granularity as well as recommend the use of persistent identifiers linked with 
enriched metadata at the landing page. NEXT STEPS The promotion of data 
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sharing and the development of a citation culture among the scientific 
community are future challenges. Interoperability becomes increasingly 
important for publishers and infrastructure providers. The already existent 
heterogeneity of services demands solutions for better user guidance. 
Building information competence is an asset of libraries, which can and 
should be expanded to research data. 
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The Nanomaterial Data Curation Initiative (NDCI), a project of the National 
Cancer Informatics Program Nanotechnology Working Group (NCIP 
NanoWG), explores the critical aspect of data curation within the 
development of informatics approaches to understanding nanomaterial 
behavior. Data repositories and tools for integrating and interrogating 
complex nanomaterial datasets are gaining widespread interest, with multiple 
projects now appearing in the US and the EU. Even in these early stages of 
development, a single common aspect shared across all nanoinformatics 
resources is that data must be curated into them. Through exploration of sub- 
topics related to all activities necessary to enable, execute, and improve the 
curation process, the NDCI will provide a substantive analysis of 
nanomaterial data curation itself, as well as a platform for multiple other 
important discussions to advance the field of nanoinformatics. This article 
outlines the NDCI project and lays the foundation for a series of papers on 
nanomaterial data curation. The NDCI purpose is to: 1) present and evaluate 


86 


the current state of nanomaterial data curation across the field on multiple 
specific data curation topics, 2) propose ways to leverage and advance 
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undertaking of complex topics such as uncertainty, reproducibility, and 
interoperability is proposed as an important path to addressing key challenges 
within the nanomaterial community, such as reducing collateral negative 
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class of technologies. 
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resources scientists at the University of Minnesota. It examines data sharing 
rates, methods, and disciplinary differences and discusses the characteristics 
of researchers, data, methods, and aspects of data sharing across this group of 
disciplines. METHODS Data sharing practices are investigated by reviewing 
the two most recently published research articles (n=155) for each faculty 
member (n=78) in three departments at a single large research university. All 
mentions of data sharing in each publication were pursued in order to locate, 
analyze, and characterize shared data. RESULTS Seventy-two of 155 (46%) 
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articles indicated that related research data was publicly shared by some 
method. The most prevalent method for data sharing was via journal websites, 
with 91% of data sharing articles using this method. Ecology, evolution, and 
behavior scientists shared data at the highest rate (70% of their articles), 
contrasting with fisheries, wildlife, and conservation biologists (18%), and 
forest resources (16%). DISCUSSION Differences between data sharing 
practices may be attributable to a range of influences: funder, journal, and 
institutional policies; disciplinary norms; and perceived or real rewards or 
incentives, as well as contrasting concerns, cost, or other barriers to sharing 
data. CONCLUSION Study results suggest differential approaches to data 
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Open data, FAIR (findable, accessible, interoperable and reusable) and 
research data management (RDM) are three overlapping but distinct concepts, 
each emphasizing different aspects of handling and sharing research data. 
They have different strengths in terms of informing and influencing how 
research data is treated, and there is much scope for enrichment of data if they 
are applied collectively. This paper explores the boundaries of each concept 
and where they intersect and overlap. As well as providing greater 
definitional clarity, this will help researchers to manage and share their data, 
and those supporting researchers, such as librarians and data stewards, to 
understand how these concepts can best be used in an advocacy setting. FAIR 
and open both focus on data sharing, ensuring content is made available in 
ways that promote access and reuse. Data management by contrast is about 
the stewardship of data from the point of conception onwards. It makes no 
assumptions about access, but is essential if data are to be meaningful to 
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this paper argues, a useful way to engage researchers and encourage good 
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The economic and societal benefits of making research data available for 
reuse and verification are now widely understood and accepted. However, 
there are some research studies, particularly those involving human 
participants, which face particular challenges in making their data openly 
available due to the sensitivities of the data. Despite its potential value to 
society this material is invariably kept locked away due to concerns over its 
inappropriate disclosure. The University of Bristol's Research Data Service 
has developed the institutional infrastructure, including policies and 
procedures, required to safely grant access to sensitive research data in a way 
that is transparent, secure, sustainable and crucially, replicable by other 
institutions. 


This paper looks at the background and challenges faced by the institution in 
dealing with sensitive data, outlines the approach taken and some of the 
outstanding issues to be tackled. 
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As scientific research and development become more collaborative, the 
diversity of skills and expertise involved in producing scientific data are 
expanding as well. Since recognition of contribution has significant academic 
and professional impact for participants in scientific projects, it is important 
to integrate attribution and acknowledgement of scientific contributions into 
the research and data lifecycle. However, defining and clarifying 
contributions and the relationship of specific individuals and organizations 
can be challenging, especially when balancing the needs and interests of 
diverse partners. Designing an implementation method for attributing 
scientific contributions within complex projects that can allow ease of use and 
integration with existing documentation formats is another crucial 
consideration. 


To provide a versatile mechanism for organizing, documenting, and storing 
contributions to different types of scientific projects and their related 
products, an attribution and acknowledgement matrix and XML schema have 
been created as part of the Attribution and Acknowledgement Content 
Framework (AACF). Leveraging the taxonomies of contribution roles and 
types that have been developed and published previously, the authors 
consolidated 16 contribution types that could be considered and used when 
accrediting team member’s contributions. Using these contribution types, 
specific information regarding the contributing organizations and individuals 
can be documented using the AACF. 


This paper provides the background and motivations for creating the current 
version of the AACF Matrix and Schema, followed by demonstrations of the 
process and the results of using the Matrix and the Schema to record the 
contribution information of different sample datasets. The paper concludes by 
highlighting the key feedback and features to be examined in order to 
improve the next revisions of the Matrix and the Schema. 
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As scientific data volumes, format types, and sources increase rapidly with 
the invention and improvement of scientific capabilities, the resulting datasets 
are becoming more complex to manage as well. One of the significant 
management challenges is pulling apart the individual contributions of 
specific people and organizations within large, complex projects. This is 
important for two aspects: 1) assigning responsibility and accountability for 
scientific work, and 2) giving professional credit to individuals (e.g. hiring, 
promotion, and tenure) who work within such large projects. This paper aims 
to review the extant practice of data attribution and how it may be improved. 
Through a case study of creating a detailed attribution record for a climate 
model dataset, the paper evaluates the strengths and weaknesses of the current 
data attribution method and proposes an alternative attribution framework 
accordingly. The paper concludes by demonstrating that, analogous to 
acknowledging the different roles and responsibilities shown in movie credits, 
the methodology developed in the study could be used in general to identify 
and map out the relationships among the organizations and individuals who 
had contributed to a dataset. As a result, the framework could be applied to 
create data attribution for other dataset types beyond climate model datasets. 
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2 (2017): 47-60. https://doi.org/10.2218/ijdc.v1212.508 


Effective management is a key component for preparing data to be retained 
for future long term access, use, and reuse by a broader community. 
Developing the skills to plan and perform data management tasks is important 
for individuals and institutions. Teaching data literacy skills may also help to 
mitigate the impact of data deluge and other effects of being overexposed to 
and overwhelmed by data. 


The process of learning how to manage data effectively for the entire research 
data lifecycle can be complex. There are often multiple stages involved within 
a lifecycle for managing data, and each stage may require specific knowledge, 
expertise, and resources. Additionally, although a range of organizations 
offers data management education and training resources, it can often be 
difficult to assess how effective the resources are for educating users to meet 
their data management requirements. 


91 


In the case of Data Observation Network for Earth (DataONE), DataONE's 
extensive collaboration with individuals and organizations has informed the 
development of multiple educational resources. Through these interactions, 
DataONE understands that the process of creating and maintaining 
educational materials that remain responsive to community needs is reliant on 
careful evaluations. Therefore, the impetus for a comprehensive, customizable 
Education EV Aluation instrument (EEVA) is grounded in the need for tools 
to assess and improve current and future training and educational resources 
for research data management. 


In this paper, the authors outline and provide context for the background and 
motivations that led to creating EEVA for evaluating the effectiveness of data 
management educational resources. The paper details the process and results 
of the current version of EEVA. Finally, the paper highlights the key features, 
potential uses, and the next steps in order to improve future extensions and 
revisions of EEVA. 
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no. 1 (2017): 65-71. https://doi.org/10.2218/ijdc.v1211.531 


To address the complexities researchers face during publication, and the 
potential community-wide benefits of wider adoption of clear data policies, 
the publisher Springer Nature has developed a standardised, common 
framework for the research data policies of all its journals. An expert working 
group was convened to audit and identify common features of research data 
policies of the journals published by Springer Nature, where policies were 
present. The group then consulted with approximately 30 editors, covering all 
research disciplines within the organisation. The group also consulted with 
academic editors, librarians and funders, which informed development of the 
framework and the creation of supporting resources. Four types of data policy 
were defined in recognition that some journals and research communities are 
more ready than others to adopt strong data policies. As of January 2017 more 
than 700 journals have adopted a standard policy and this number is growing 
weekly. To potentially enable standardisation and harmonisation of data 
policy across funders, institutions, repositories, societies and other publishers, 
the policy framework was made available under a Creative Commons license. 
However, the framework requires wider debate with these stakeholders and an 
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Interest Group within the Research Data Alliance (RDA) has been formed to 
initiate this process. 
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This paper describes the plans and strategies to develop Portage, a national 
network of sustainable, shared services for research data management (RDM) 
in Canada. A description of the RDM context in Canada is provided. This 
environment has heightened expectations around the Government of Canada's 
Open Science plans and includes deliverables aimed at improving access to 
publications and data resulting from federally funded scientific activities. At 
the same time, a recent environmental scan published by Canada's three 
federal research granting councils reveals significant gaps in services, 
infrastructure, and funding mechanisms to support RDM. In addition, 
Canada's RDM environment consists of stakeholders from a variety of 
communities with minimal ongoing coordination or cooperation. 


The Portage network was conceived as a collaborative network model based 
on libraries’ strong connections with researchers across the disciplines, an 
ethos of curation and preservation, and experience with systems for managing 
data in all its forms. A pilot project provided Portage with a vision and set of 
principles, and identified several objectives as the small wins that would build 
the trust and shared understanding required for a successful network. Current 
services and activities of Portage, including a data management planning tool 
and an infrastructure project, are described in this paper. 
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Portage now faces the challenge of moving from project to operational 
network, and the challenge of establishing a sustainable governance model. 
CARL appointed a Steering Committee that will be proposing a full 
governance model at the conclusion of this transition period. Using a 
framework of factors identified in the literature, several relevant collaborative 
and network governance models are being explored. 


This paper outlines experience to date with Portage and matters under 
consideration for long-term sustainability, with a goal of engaging 
international colleagues in discussion and furthering the concepts for the 
benefit of RDM networks everywhere. 


Hunter, Jane. "Scientific Publication Packages—A Selective Approach to the 
Communication and Archival of Scientific Output." International Journal of 
Digital Curation 1, no. 1 (2006): 33-52. https://doi.org/10.2218/ijdc.v 111.4 


The use of digital technologies within research has led to a proliferation of 
data, many new forms of research output and new modes of presentation and 
analysis. Many scientific communities are struggling with the challenge of 
how to manage the terabytes of data and new forms of output, they are 
producing. They are also under increasing pressure from funding 
organizations to publish their raw data, in addition to their traditional 
publications, in open archives. In this paper I describe an approach that 
involves the selective encapsulation of raw data, derived products, algorithms, 
software and textual publications within "scientific publication packages". 
Such packages provide an ideal method for: encapsulating expert knowledge; 
for publishing and sharing scientific process and results; for teaching complex 
scientific concepts; and for the selective archival, curation and preservation of 
scientific data and output. They also provide a bridge between technological 
advances in the Digital Libraries and eScience domains. In particular, I 
describe the RDF-based architecture that we are adopting to enable scientists 
to construct, publish and manage “scientific publication packages” — 
compound digital objects that encapsulate and relate the raw data to its 
derived products, publications and the associated contextual, provenance and 
administrative metadata. 


Ikeshoji-Orlati, Veronica A., and Clifford B. Anderson. "Developing Data 
Curation Protocols for Digital Projects at Vanderbilt: Une Micro-Histoire." 
International Journal of Digital Curation, 12, no. 2 (2017): 246-254. 
https://doi.org/10.2218/ijdc.v 1212.574 


94 


This paper examines the intersection of legacy digital humanities projects and 
the ongoing development of research data management services at Vanderbilt 
University's Jean and Alexander Heard Library. Future directions for data 
management and curation protocols are explored through the lens of a case 
study: the (re)curation of data from an early 2000s e-edition of Raymond 
Poggenburg's Charles Baudelaire: Une Micro-histoire. The vagaries of 
applying the Library of Congress Metadata Object Description Schema 
(MODS) to the data and metadata of the Micro-histoire will be addressed. In 
addition, the balance between curating data and metadata for preservation vs. 
curating it for (re)use by future researchers is considered in order to suggest 
future avenues for holistic research data management services at Vanderbilt. 
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Pilot at the University of Manitoba: A Canadian Experience." Journal of eScience 
Librarianship 4, no. 2 (2015): e1061. https://doi.org/10.7191/jeslib.2014.1061 


Canada's federal funding agencies are following the directions of funding 
agencies in the United States and United Kingdom, and will soon require a 
data management plan in grant applications. The University of Manitoba 
Libraries in Canada has started planning and implementing research data 
services, and education is seen as a key component. In June 2014, the New 
England Collaborative Data Management Curriculum (NECDMC) (Lamar 
Soutter Library, University of Massachusetts Medical School 2014) was 
piloted and used to provide data management training for a group of subject 
librarians at the University of Manitoba Libraries, in combination with 
information about data-related policies of the Canadian funding agencies and 
the University of Manitoba. The seven NECDMC modules were delivered in 
a seminar style, with emphasis on group discussions and Canadian content. 
The benefits of NECDMC—adaptability and flexible framework—should be 
weighed against the challenges experienced in the pilot, mainly the significant 
amount of time needed to create local content and complement the existing 
curriculum. Overall, the pilot showed that NECDMC is a good, thorough 
introduction to data management, and that it is possible to adapt NECDMC to 
the local and Canadian settings in an effective way. 
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We report on a case study which examines the social science community's 
capability and institutional support for data management. Fourteen 
researchers were invited for an in-depth qualitative survey between June 2014 
and October 2015. We modify and adopt the Community Capability Model 
Framework (CCMF) profile tool to ask these scholars to self-assess their 
current data practices and whether their academic environment provides 
enough supportive infrastructure for data related activities. The exemplar 
disciplines in this report include anthropology, political sciences, and library 
and information science. 


Our findings deepen our understanding of social disciplines and identify 
capabilities that are well developed and those that are poorly developed. The 
participants reported that their institutions have made relatively slow progress 
on economic supports and data science training courses, but acknowledged 
that they are well informed and trained for participants’ privacy protection. 
The result confirms a prior observation from previous literature that social 
scientists are concerned with ethical perspectives but lack technical training 
and support. The results also demonstrate intra- and inter-disciplinary 
commonalities and differences in researcher perceptions of data-intensive 
capability, and highlight potential opportunities for the development and 
delivery of new and impactful research data management support services to 
social sciences researchers and faculty. 
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Service for UK Universities." Insights: The UKSG Journal 30, no. 1 (2017): 59-70. 
https://doi.org/10.1629/uksg.346 


96 
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Science and Technology 42, no. 5 (2016): 38-40. 
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Johnson, Andrew M., and Shelley Knuth. "Data Management Plan Requirements 
for Campus Grant Competitions: Opportunities for Research Data Services 
Assessment and Outreach." Journal of eScience Librarianship 5, no. 1 (2016): 
e1089. https://doi.org/10.719 1/jeslib.2016.1089 


Objective: To examine the effects of research data services (RDS) on the 
quality of data management plans (DMPs) required for a campus-level faculty 
grant competition, as well as to explore opportunities that the local DMP 
requirement presented for RDS outreach. 


Methods: Nine reviewers each scored a randomly assigned portion of DMPs 
from 82 competition proposals. Each DMP was scored by three reviewers, 
and the three scores were averaged together to obtain the final score. 
Interrater reliability was measured using intraclass correlation. Unpaired t- 
tests were used to compare mean DMP scores for faculty who utilized RDS 
services with those who did not. Unpaired t-tests were also used to compare 
mean DMP scores for proposals that were funded with proposals that were 
not funded. One-way ANOVA was used to compare mean DMP scores 
among proposals from six broad disciplinary categories. 


Results: Analyses showed that RDS consultations had a statistically 
significant effect on DMP scores. Differences between DMP scores for 
funded versus unfunded proposals and among disciplinary categories were not 
significant. The DMP requirement also provided a number of both expected 
and unexpected outreach opportunities for RDS services. 


Conclusions: Requiring DMPs for campus grant competitions can provide 
important assessment and outreach opportunities for research data services. 
While these results might not be generalizable to DMP review processes at 
federal funding agencies, they do suggest the importance, at any level, of 
developing a shared understanding of what constitutes a high quality DMP 
among grant applicants, grant reviewers, and RDS providers. 


Johnson, Andrew W., and Megan M. Bresnahan. "DataDay!: Designing and 
Assessing a Research Data Workshop for Subject Librarians." Journal of 
Librarianship and Scholarly Communication 3, no. 2 (2015): eP1229. 
http://doi.org/10.7710/2162-3309.1229 
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BACKGROUND Many libraries have launched or adapted services to address 
the research data needs of campus faculty and students. At the University of 
Colorado Boulder (CU-Boulder), local demand for research data training 
emerged from a broader assessment of training needs for subject librarians. 
The findings from this assessment led to the development of a day-long 
workshop called DataDay! that aimed to expand and translate the skills of 
subject librarians into the context of research data support. DESCRIPTION 
OF PROGRAM The DataDay! workshop incorporated hands-on exercises 
with expert presentations, informal discussions, and print handouts. The 
workshop allowed participants to gain experience with activities like working 
with real data sets and developing materials for outreach about research data 
services. Several instruments were used to assess the workshop learning 
outcomes, which included changes in knowledge and comfort levels related to 
engaging in research data support. Assessment activities also measured how 
well participants applied concepts taught in the workshop to novel situations. 
NEXT STEPS Future research data training efforts for CU-Boulder librarians 
will be informed by the DataDay! workshop assessment results, and this 
workshop may provide a model for other institutions to use to train subject 
librarians to adapt to new roles in support of research data. There is also a 
need for the lessons learned from local training efforts like DataDay! to 
inform the development of resources to support the broader subject librarian 
community as their institutions launch and grow research data services. 
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Funders increasingly require that data sets arising from sponsored research 
must be preserved and shared, and many publishers either require or 
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encourage that data sets accompanying articles are made available through a 
publicly accessible repository. Additionally, many researchers wish to make 
their data available regardless of funder requirements both to enhance their 
impact and also to propel the concept of open science. However, the data 
curation activities that support these preservation and sharing activities are 
costly, requiring advanced curation practices, training, specific technical 
competencies, and relevant subject expertise. Few colleges or universities will 
be able to hire and sustain all of the data curation expertise locally that its 
researchers will require, and even those with the means to do more will 
benefit from a collective approach that will allow them to supplement at peak 
times, access specialized capacity when infrequently-curated types arise, and 
stabilize service levels to account for local staff transition, such as during 
turn-over periods. The Data Curation Network (DCN) provides a solution for 
partners of all sizes to develop or to supplement local curation expertise with 
the expertise of a resilient, distributed network, and creates a funding stream 
to both sustain central services and support expansion of distributed expertise 
over time. This paper presents our next steps for piloting the DCN, scheduled 
to launch in the spring of 2018 across nine partner institutions. Our 
implementation plan is based on planning phase research performed from 
2016-2017 that monitored the types, disciplines, frequency, and curation 
needs of data sets passing through the curation services at the six planning 
phase institutions. Our DCN implementation plan includes a well-coordinated 
and tiered staffing model, a technology-agnostic submission workflow, 
standardized curation procedures, and a sustainability approach that will 
allow the DCN to prevail beyond the grant-supported implementation phase 
as a curation-as-service model. 
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This paper reviews developments in funders' data management and sharing 
policies, and explores the extent to which they have affected practice. The 
Digital Curation Centre has been monitoring UK research funders’ data 
policies since 2008. There have been significant developments in subsequent 
years, most notably the joint Research Councils UK's Common Principles on 
Data Policy and the Engineering and Physical Sciences Research Council's 
Policy Framework on Research Data. This paper charts these changes and 
highlights shifting emphasises in the policies. Institutional data policies and 
infrastructure are increasingly being developed as a result of these changes. 
While action is clearly being taken, questions remain about whether the 
changes are affecting practice on the ground. 
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The Polar Data Centre (PDC) of the National Institute of Polar Research 
(NIPR) has a responsibility to manage polar science data as part of the 
National Antarctic Data Centre and the Science Committee on Antarctic 
Research. During the International Polar Year IPY 2007-2008), a remarkable 
number of data/metadata involving multi-disciplinary science activities were 
compiled. Although the long-term stewardship of the accumulation of 
metadata falls to the data center of NIPR, the work has been in collaboration 
with the Global Change Master Directory, the Polar Information Commons, 
the World Data System and other data science bodies/communities under the 
International Council for Science. In addition, links with other data centers, 
such as the Data Integration and Analysis System Program of the Global 
Earth Observation System of Systems and the Polar Data Catalogue of 
Canada were initiated in 2014 using the Open Archives Initiative Protocol for 
Metadata Harvesting. The metadata compiled by the PDC were recently 
modified using an automatic attributing system and DataCite through the 
Japan Link Center. 
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We present a case study of data integration and reuse involving 12 researchers 
who published datasets in Open Context, an online data publishing platform, 
as part of collaborative archaeological research on early domesticated animals 
in Anatolia. Our discussion reports on how different editorial and 
collaborative review processes improved data documentation and quality, and 
created ontology annotations needed for comparative analyses by domain 
specialists. To prepare data for shared analysis, this project adapted editor- 
supervised review and revision processes familiar to conventional publishing, 
as well as more novel models of revision adapted from open source software 
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development of public version control. Preparing the datasets for publication 
and analysis required significant investment of effort and expertise, including 
archaeological domain knowledge and familiarity with key ontologies. To 
organize this work effectively, we emphasized these different models of 
collaboration at various stages of this data publication and analysis project. 
Collaboration first centered on data editors working with data contributors, 
then widened to include other researchers who provided additional peer- 
review feedback, and finally the widest research community, whose 
collaboration is facilitated by GitHub's version control system. We 
demonstrate that the "publish" and "push" models of data dissemination need 
not be mutually exclusive; on the contrary, they can play complementary 
roles in sharing high quality data in support of research. This work highlights 
the value of combining multiple models in different stages of data 
dissemination. 
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INTRODUCTION The practice of publishing supplementary materials with 
journal articles is becoming increasingly prevalent across the sciences. We 
sought to understand better the content of these materials by investigating the 
differences between the supplementary materials published by authors in the 
geosciences and plant sciences. METHODS We conducted a random 
stratified sampling of four articles from each of 30 journals published in 2013. 
In total, we examined 297 supplementary data files for a range of different 
factors. RESULTS We identified many similarities between the practices of 
authors in the two fields, including the formats used (Word documents, Excel 
spreadsheets, PDFs) and the small size of the files. There were differences 
identified in the content of the supplementary materials: the geology materials 
contained more maps and machine-readable data; the plant science materials 
included much more tabular data and multimedia content. DISCUSSION Our 
results suggest that the data shared through supplementary files in these fields 
may not lend itself to reuse. Code and related scripts are not often shared, nor 
is much 'raw' data. Instead, the files often contain summary data, modified for 
human reading and use. CONCLUSION Given these and other differences, 
our results suggest implications for publishers, librarians, and authors, and 
may require shifts in behavior if effective data sharing is to be realized. 
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analysis of the data, which led to the development of this framework. The 
proposed framework is largely based on the Open Archival Information 
System (OAIS) functional model and caters for the curation of both analogue 
and digital data. 
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place for peer review and formal citation of datasets. Data publication is 
becoming increasingly important to the scientific community, as it will 
provide a mechanism for those who create data to receive academic credit for 
their work and will allow the conclusions arising from an analysis to be more 
readily verifiable, thus promoting transparency in the scientific process. Peer 
review of data will also provide a mechanism for ensuring the quality of 
datasets, and we provide suggestions on the types of activities one expects to 
see in the peer review of data. A simple taxonomy of data publication 
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In the United States, research funded by the government produces a 
significant portion of data. US law mandates that these data should be freely 
available to the public through ‘public access’, which is defined as fully 
discoverable and usable by the public. The U.S. government executive branch 
supported the public access requirements by issuing an Executive Directive 
titled Increasing Access to the Results of Federally Funded Scientific 
Research’ that required federal agencies with annual research and 
development expenditures of more than $100 million to create public access 
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by the Executive Order 'Making Open and Machine Readable the New 
Default for Government Information’ which was accompanied by a 
memorandum with specific guidelines for information management and 
instructions to find ways to reduce compliance costs through interagency 
cooperation. 
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gap analysis of continuing education and readiness assessment of the 
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The importance of managing research data has been emphasized by the 
government, funding agencies, and scholarly communities. Increased access 
to research data increases the impact and efficiency of scientific activities and 
funding. Thus, many research institutions have established or plan to establish 
research data curation services as part of their Institutional Repositories (IRs). 
However, in order to design effective research data curation services in IRs, 
and to build active research data providers and user communities around those 
IRs, it is essential to study current data curation practices and provide rich 
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practices. Based on 13 interviews with 15 IR staff members from 13 large 
research universities in the United States, this paper provides a rich, 
qualitative description of research data curation and use practices in IRs. In 
particular, the paper identifies data curation and use activities in IRs, as well 
as their structures, roles played, skills needed, contradictions and problems 
present, solutions sought, and workarounds applied. The paper can inform the 
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research and the conceptualization of data that underpins it. The 
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shift in the conceptualization of data as research materials and sources of 
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them and to defend specific interpretations. The presentation of data, the way 
they are identified, selected and included (or excluded) in databases and the 
information provided to users to re-contextualize them are fundamental to 
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around interpreting data and assessing their quality can be tackled by 
cultivating governance strategies around how data are collected, managed and 
processed. 
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Research data is being generated at an ever-increasing rate. This brings 
challenges in how to store, analyse, and care for the data. A component of this 
problem is the stewardship of data and associated files that need a safe and 
secure home for the medium to long-term. 
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are provided with large allocations of ‘active data storage’. This is often stored 
on expensive and fast disks to enable efficient transfer and working with large 
amounts of data. However, over time this active data store fills up, and 
researchers need a facility to move older but still valuable data to cheaper 
storage for long-term care. In addition, research funders are increasingly 
requiring data to be stored in forms that allow it to be described and retrieved 
in the future. For data that can't be shared publicly in an open repository, a 
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cost efficiency. 


This paper describes a solution to these requirements, called the Data Vault. 
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It is widely acknowledged that data sharing has great potential for scientific 
progress. However, so far making data available has little impact on a 
researcher’s reputation. Thus, data sharing can be conceptualized as a social 
dilemma. In the presented study we investigated the influence of the 
researcher's personality within the social dilemma of data sharing. The 
theoretical background was the appropriateness framework. We conducted a 
survey among 1564 researchers about data sharing, which also included 
standardized questions on selected personality factors, namely the so-called 
Big Five, Machiavellianism and social desirability. Using regression analysis, 
we investigated how these personality domains relate to four groups of 
dependent variables: attitudes towards data sharing, the importance of factors 
that might foster or hinder data sharing, the willingness to share data, and 
actual data sharing. Our analyses showed the predictive value of personality 
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for all four groups of dependent variables. However, there was not a global 
consistent pattern of influence, but rather different compositions of effects. 
Our results indicate that the implications of data sharing are dependent on 
age, gender, and personality. In order to foster data sharing, 1t seems 
advantageous to provide more personal incentives and to address the 
researchers’ individual responsibility. 
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In the German social and economic sciences there is a growing awareness of 
flexible data distribution and research data reuse, especially as increasing 
numbers of research funders recommend publishing research data as the basis 
for scientific insight. However, a data-sharing mentality has not yet been 
established in Germany attributable to researchers’ strong reservations about 
publishing their data. This attitude is exacerbated by the fact that, at present, 
there is no trusted national data sharing repository that covers the particular 
requirements of institutions regarding research data. This article discusses 
how this objective can be achieved with the project initiative SowiDataNet. 
The development of a community-driven data repository is a logically 
consistent and important step towards an attitude shift concerning data 
sharing in the social and economic sciences. 


Littauer, Richard, Karthik Ram, Bertram Ludascher, William Michener, and 
Rebecca Koskela. "Trends in Use of Scientific Workflows: Insights from a Public 


Repository and Recommendations for Best Practice." International Journal of 
Digital Curation 7, no. 2 (2012): 92-100. https://doi.org/10.2218/ijdce.v7i2.232 


Scientific workflows are typically used to automate the processing, analysis 
and management of scientific data. Most scientific workflow programs 
provide a user-friendly graphical user interface that enables scientists to more 
easily create and visualize complex workflows that may be comprised of 
dozens of processing and analytical steps. Furthermore, many workflows 
provide mechanisms for tracing provenance and methodologies that foster 
reproducible science. Despite their potential for enabling science, few studies 
have examined how the process of creating, executing, and sharing workflows 
can be improved. In order to promote open discourse and access to scientific 
methods as well as data, we analyzed a wide variety of workflow systems and 
publicly available workflows on the public repository myExperiment. It is 


121 


hoped that understanding the usage of workflows and developing a set of 
recommended best practices will lead to increased contribution of workflows 
to the public domain. 
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Objective: Best practices such as the FAIR Principles (Findability, 
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review by domain experts to evaluate the reusability of the datasets in our 
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The South African Network of Data and Information Curation Communities 
(NeDICC) was formed to promote the development and use of standards and 
best practices among South African data stewards and data librarians 
(NeDICC, 2015). The steering committee has members from various South 
African HEIs and research councils. As part of their service offerings 
NeDICC arranges seminars, workshops and conferences to promote 
awareness regarding digital curation. NeDICC has contributed to the increase 
in awareness, and growth of knowledge, on the subject of digital and data 
curation in South Africa (Kahn et al.,2014).NeDICC members are involved in 
the UP M.IT and Continued Professional Development training, and serve as 
external examiners for the UCT M.Phil in Digital Curation degree. NeDICC 
is responsible for the Research Data Management track at the annual e- 
Research conference in SAland develops an annual training-focussed 
programme to provide workshop opportunities with both SA and foreign 
trainers. This paper specifically addresses the efforts by this community to 
mobilise and upskill South African librarians so that they would be willing 
and able to provide the necessary RDM services that would strengthen the 
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This paper reports on the design, delivery and assessment of a model for 
internal library education around research data management (RDM). 
Conducted at the University of Pittsburgh Library System (ULS), the exercise 
and resultant instructional session employed an active learning approach, in 
which a group of librarians and archivists explored data issues and 
conventions in a discipline of their own selection and presented their findings 
to an audience of library colleagues. In this paper, we put forth an adaptable 
active learning model for internal RDM education and offer guidance for its 
implementation by peer libraries that are similarly building internal capacity 
for the design and delivery of RDM services that are responsive to 
disciplinary needs. 
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In this paper, we present the Core Scientific Metadata Model (CSMD), a 
model for the representation of scientific study metadata developed within the 
Science & Technology Facilities Council (STFC) to represent the data 
generated from scientific facilities. The model has been developed to allow 
management of and access to the data resources of the facilities in a uniform 
way, although we believe that the model has wider application, especially in 
areas of "structural science" such as chemistry, materials science and earth 
sciences. We give some motivations behind the development of the model, 
and an overview of its major structural elements, centred on the notion of a 
scientific study formed by a collection of specific investigations. We give 
some details of the model, with the description of each investigation 
associated with a particular experiment on a sample generating data, and the 
associated data holdings are then mapped to the investigation with the 
appropriate parameters. We then go on to discuss the instantiation of the 
metadata model within a production quality data management infrastructure, 
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the Information CATalogue (ICAT), which has been developed within STFC 
for use in large-scale photon and neutron sources. Finally, we give an 
overview of the relationship between CSMD, and other initiatives, and give 
some directions for future developments. 


Mattmann, C., Crichton, D. J., A. F. Hart, S. C. Kelly, and J. S. Hughes. 
"Experiments with Storage and Preservation of NASA's Planetary Data via the 
Cloud." IT Professional 12, no. 5 (2010): 28-35. 
https://doi.org/10.1109/mitp.2010.97 


Mauthner, Natasha Susan, and Odette Parry. "Open Access Digital Data Sharing: 
Principles, Policies and Practices." Social Epistemology: A Journal of Knowledge, 
Culture and Policy 27, no. 1 (2013): 47-67. 
https://doi.org/10.1080/02691728.2012.760663 


Mayernik, Matthew S. "Data Citation Initiatives and Issues." Bulletin of the 
American Society for Information Science and Technology 38, no. 5 (2012): 23-28. 
https://doi.org/10.1002/bult.2012.1720380508 





. "Research Data and Metadata Curation as Institutional Issues." Journal of 
the Association for Information Science and Technology 67, no. 4 (2015): 973-993. 
https://doi.org/10.1002/asi1.23425 


Mayernik, Matthew S., Sarah Callaghan, Roland Leighm, Jonathan Tedds, and 
Steven Worley. "Peer Review of Datasets: When, Why, and How." Bulletin of the 
American Meteorological Society 96 (2015): 191-201. 
http://dx.doi.org/10.1175/BAMS-D-13-00083. 1 


Mayernik, Matthew S., G. Sayeed Choudhury, Tim DiLauro, Elliot Metsger, 
Barbara Pralle, Mike Rippin, and Ruth Duerr. "The Data Conservancy Instance: 
Infrastructure and Organizational Services for Research Data Curation." D-Lib 
Magazine 18, no. 9/10 (2012). https://doi.org/10.1045/september2012-mayernik 


Mayernik, Matthew S., Tim DiLauro, Ruth Duerr, Elliot Metsger, Anne E. 
Thessen, and G. Sayeed Choudhury. "Data Conservancy Provenance, Context, and 
Lineage Services: Key Components for Data Preservation and Curation." Data 
Science Journal 12 (2013): 158-171. https://doi.org/10.248 1/dsj.12-039 


Among the key services that institutional data management infrastructures 
must provide are provenance and lineage tracking and the ability to associate 
data with contextual information needed for understanding and use. These 
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functionalities are critical for addressing a number of key issues faced by data 
collectors and users, including trust in data, results traceability, data 
transparency, and data citation support. In this paper, we describe the support 
for these services within the Data Conservancy Service (DCS) software. The 
DCS provenance, context, and lineage services cross the four layers in the 
DCS data curation stack model: storage, archiving, preservation, and curation. 


Mayernik, Matthew S., Jennifer Phillips, and Eric Nienhouse. "Linking 
Publications and Data: Challenges, Trends, and Opportunities." D-Lib Magazine 
22, no. 5/6 (2016). https://doi.org/10.1045/may2016-mayernik 


Mayo, Christine, Todd J. Vision, and Elizabeth A. Hull. "The Location of the 
Citation: Changing Practices in How Publications Cite Original Data in the Dryad 
Digital Repository." International Journal of Digital Curation 11, no. 1 (2016): 
150-155. https://doi.org/10.2218/ijdc.v1 111.400 


While stakeholders in scholarly communication generally agree on the 
importance of data citation, there is not consensus on where those citations 
should be placed within the publication—particularly when the publication is 
citing original data. Recently, CrossRef and the Digital Curation Center 
(DCC) have recommended as a best practice that original data citations 
appear in the works cited sections of the article. In some fields, such as the 
life sciences, this contrasts with the common practice of only listing data 
identifier(s) within the article body (intratextually). We inquired whether data 
citation practice has been changing in light of the guidance from CrossRef 
and the DCC. We examined data citation practices from 2011 to 2014 ina 
corpus of 1,125 articles associated with original data in the Dryad Digital 
Repository. The percentage of articles that include no reference to the original 
data has declined each year, from 31% in 2011 to 15% in 2014. The 
percentage of articles that include data identifiers intratextually has grown 
from 69% to 83%, while the percentage that cite data in the works cited 
section has grown from 5% to 8%. If the proportions continue to grow at the 
current rate of 19-20% annually, the proportion of articles with data citations 
in the works cited section will not exceed 90% until 2030. 
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Genomic and environmental sciences represent two poles of scientific data. In 
the first, highly parallel sequencing facilities generate large quantities of 
sequence data. In the latter, loosely networked remote and field sensors 
produce intermittent streams of different data types. Yet both genomic and 
environmental sciences are said to be moving to data intensive research. This 
paper explores and contrasts data flow in these two domains in order to better 
understand how data intensive research is being done. Our case studies are 
next generation sequencing for genomics and environmental networked 
sensors. 


Our objective was to enrich understanding of the 'intensive' processes and 
properties of data intensive research through a ‘sociology’ of data using 
methods that capture the relational properties of data flows. Our key 
methodological innovation was the staging of events for practitioners with 
different kinds of expertise in data intensive research to participate in the 
collective annotation of visual forms. Through such events we built a 
substantial digital data archive of our own that we then analysed in terms of 
three traits of data flow: durability, replicability and metrology. 
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Our findings are that analysing data flow with respect to these three traits 
provides better insight into how doing data intensive research involves 
people, infrastructures, practices, things, knowledge and institutions. 
Collectively, these elements shape the topography of data and condition how 
it flows. We argue that although much attention is given to phenomena such 
as the scale, volume and speed of data in data intensive research, these are 
measures of what we call 'extensive' properties rather than intensive ones. Our 
thesis is that extensive changes, that is to say those that result in non-linear 
changes in metrics, can be seen to result from intensive changes that bring 
multiple, disparate flows into confluence. 


If extensive shifts in the modalities of data flow do indeed come from the 
alignment of disparate things, as we suggest, then we advocate the staging of 
workshops and other events with the purpose of developing the 'missing' 
metrics of data flow. 
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The National Imaging Facility (NIF) provides Australian researchers with 
state-of-the-art instrumentation—including magnetic resonance imaging 
(MRI), positron emission tomography (PET), X-ray computed tomography 
(CT) and multispectral imaging — and expertise for the characterisation of 
animals, plants and materials. 


To maximise research outcomes, as well as to facilitate collaboration and 
sharing, it is essential not only that the data acquired using these instruments 
be managed, curated and archived in a trusted data repository service, but also 
that the data itself be of verifiable quality. In 2017, several NIF nodes 
collaborated on a national project to define the requirements and best 
practices necessary to achieve this, and to establish exemplar services for both 
preclinical MRI data and clinical ataxia MRI data. 
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In this paper we describe the project, its key outcomes, challenges and lessons 
learned, and future developments, including extension to other 
characterisation facilities and instruments/modalities. 
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Digital information is a vital resource in our knowledge economy, valuable 
for research and education, science and the humanities, creative and cultural 
activities, and public policy (The Blue Ribbon Task Force on Sustainable 
Digital Preservation and Access, 2010). New high-throughput instruments, 
telescopes, satellites, accelerators, supercomputers, sensor networks, and 
running simulations are generating massive amounts of data (Thanos, 2011). 
These data are used by decision makers for improving the quality of life of 
citizens. Moreover, researchers are employing sophisticated technologies to 
analyse these data to address questions that were unapproachable just a few 
years ago (Helbing & Balietti, 2011). Digital technologies have fostered a 
new world of research characterized by immense datasets, unprecedented 
levels of openness among researchers, and new connections among 
researchers, policy makers, and the public (The National Academy of 
Sciences, 2009). 
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Data sharing is the practice of making data available for use by others. 
Ecologists are increasingly generating and sharing an immense volume of 
data. Such data may serve to augment existing data collections and can be 
used for synthesis efforts such as meta-analysis, for parameterizing models, 
and for verifying research results (i.e., study reproducibility). Large volumes 
of ecological data may be readily available through institutions or data 
repositories that are the most comprehensive available and can serve as the 
core of ecological analysis. Ecological data are also employed outside the 
research context and are used for decision-making, natural resource 
management, education, and other purposes. Data sharing has a long history 
in many domains such as oceanography and the biodiversity sciences (e.g., 
taxonomic data and museum specimens), but has emerged relatively recently 
in the ecological sciences. 
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A review of several of the large international and national ecological research 
programs that have emerged since the mid-1900s highlights the initial failures 
and more recent successes as well as the underlying causes-from a near 
absence of effective policies to the emergence of community and data sharing 
policies coupled with the development and adoption of data and metadata 
standards and enabling tools. Sociocultural change and the move towards 
more open science have evolved more rapidly over the past two decades in 
response to new requirements set forth by governmental organizations, 
publishers and professional societies. As the scientific culture has changed so 
has the cyberinfrastructure landscape. The introduction of community-based 
data repositories, data and metadata standards, software tools, persistent 
identifiers, and federated search and discovery have all helped promulgate 
data sharing. Nevertheless, there are many challenges and opportunities 
especially as we move towards more open science. Cyberinfrastructure 
challenges include a paucity of easy-to-use metadata management systems, 
significant difficulties in assessing data quality and provenance, and an 
absence of analytical and visualization approaches that facilitate data 
integration and harmonization. Challenges and opportunities abound in the 
sociocultural arena where funders, researchers, and publishers all have a stake 
in clarifying policies, roles and responsibilities, as well as in incentivizing 
data sharing. A set of best practices and examples of software tools are 
presented that can enable research transparency, reproducibility and new 
knowledge by facilitating idea generation, research planning, data 
management and the dissemination of data and results. 


Michener, William K., Suzie Allard, Amber Budden, Robert B. Cook, Kimberly 
Douglass, Mike Frame, Steve Kelling, Rebecca Koskela, Carol Tenopir, and David 
A. Vieglais. "Participatory Design of DatAONE—Enabling Cyberinfrastructure for 
the Biological and Environmental Sciences." Ecological Informatics 11 (2012): 5- 
15. http://dx.doi.org/10.1016/j.ecoinf.2011.08.007 


Michener, William K, and Matthew B. Jones. "Ecoinformatics: Supporting 
Ecology as a Data-Intensive Science." Trends in Ecology & Evolution 27, no. 2 
(2012): 85-93. http://dx.doi.org/10.1016/j.tree.2011.11.016 


Michener, William K., Todd Vision, Patricia Cruse, Dave Vieglais, John Kunze, 
and Greg Janée. "DataONE: Data Observation Network for Earth—Preserving 
Data and Enabling Innovation in the Biological and Environmental Sciences." D- 
Lib Magazine 17, no. 1/2 (2011). https://doi.org/10.1045/january201 1-michener 





136 


Miksa, Tomasz, Andreas Rauber, Roman Ganguly, and Paolo Budroni. 
"Information Integration for Machine Actionable Data Management Plans." 
International Journal of Digital Curation 12, no. 1 (2017): 22-35. 
https://doi.org/10.2218/ijdc.v1211.529 


Data management plans are free-form text documents describing the data 
used and produced in scientific experiments. The complexity of data-driven 
experiments requires precise descriptions of tools and datasets used in 
computations to enable their reproducibility and reuse. Data management 
plans fall short of these requirements. In this paper, we propose machine- 
actionable data management plans that cover the same themes as standard 
data management plans, but particular sections are filled with information 
obtained from existing tools. We present mapping of tools from the domains 
of digital preservation, reproducible research, open science, and data 
repositories to data management plan sections. Thus, we identify the 
requirements for a good solution and identify its limitations. We also propose 
a machine-actionable data model that enables information integration. The 
model uses ontologies and is based on existing standards. 
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In the era of research infrastructures and big data, sophisticated data 
management practices are becoming essential building blocks of successful 
science. Most practices follow a data-centric approach, which does not take 
into account the processes that created, analysed and presented the data. This 
fact limits the possibilities for reliable verification of results. Furthermore, it 
does not guarantee the reuse of research, which is one of the key aspects of 
credible data-driven science. For that reason, we propose the introduction of 
the new concept of Process Management Plans, which focus on the 
identification, description, sharing and preservation of the entire scientific 
processes. They enable verification and later reuse of result data and 
processes of scientific experiments. In this paper we describe the structure 
and explain the novelty of Process Management Plans by showing in what 
way they complement existing Data Management Plans. We also highlight 
key differences, major advantages, as well as references to tools and solutions 
that can facilitate the introduction of Process Management Plans. 
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In the spring of 2011, the UC San Diego Research Cyberinfrastructure (RCI) 
Implementation Team invited researchers and research teams to participate in 
a research curation and data management pilot program. This invitation took 
the form of a campus-wide solicitation. More than two dozen applications 
were received and, after due deliberation, the RCI Oversight Committee 
selected five curation-intensive projects. These projects were chosen based on 
a number of criteria, including how they represented campus research, 
varieties of topics, researcher engagement, and the various services required. 
The pilot process began in September 2011, and will be completed in early 
2014. Extensive lessons learned from the pilots are being compiled and are 
being used in the on-going design and implementation of the permanent 
Research Data Curation Program in the UC San Diego Library. 


In this paper, we present specific implementation details of these various 
services, as well as lessons learned. The program focused on many aspects of 
contemporary scholarship, including data creation and storage, description 
and metadata creation, citation and publication, and long term preservation 
and access. Based on the lessons learned in our processes, the Research Data 
Curation Program will provide a suite of services from which campus users 
can pick and choose, as necessary. The program will provide support for the 
data management requirements from national funding agencies. 
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The ability to measure the use and impact of published data sets is key to the 
success of the open data/open science paradigm. A direct measure of impact 
would require tracking data (re)use in the wild, which is difficult to achieve. 
This is therefore commonly replaced by simpler metrics based on data 
download and citation counts. In this paper we describe a scenario where it is 
possible to track the trajectory of a dataset after its publication, and show how 
this enables the design of accurate models for ascribing credit to data 
originators. A Data Trajectory (DT) is a graph that encodes knowledge of 
how, by whom, and in which context data has been re-used, possibly after 
several generations. We provide a theoretical model of DTs that is grounded 
in the W3C PROV data model for provenance, and we show how DTs can be 
used to automatically propagate a fraction of the credit associated with 
transitively derived datasets, back to original data contributors. We also show 
this model of transitive credit in action by means of a Data Reuse Simulator. 
In the longer term, our ultimate hope is that credit models based on direct 
measures of data reuse will provide further incentives to data publication. We 
conclude by outlining a research agenda to address the hard questions of 
creating, collecting, and using DTs systematically across a large number of 
data reuse instances in the wild. 
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Experimental science can be thought of as the exploration of a large research 
space, in search of a few valuable results. While it is this "Golden Data" that 
gets published, the history of the exploration is often as valuable to the 
scientists as some of its outcomes. We envision an e-research infrastructure 
that is capable of systematically and automatically recording such history—an 
assumption that holds today for a number of workflow management systems 
routinely used in e-science. In keeping with our gold rush metaphor, the 
provenance of a valuable result is a "Golden Trail". Logically, this represents 
a detailed account of how the Golden Data was arrived at, and technically it is 
a sub-graph in the much larger graph of provenance traces that collectively 
tell the story of the entire research (or of some of it). 


In this paper we describe a model and architecture for a repository dedicated 
to storing provenance traces and selectively retrieving Golden Trails from it. 
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As traces from multiple experiments over long periods of time are 
accommodated, the trails may be sub-graphs of one trace, or they may be the 
logical representation of a virtual experiment obtained by joining together 
traces that share common data. 


The project has been carried out within the Provenance Working Group of the 
Data Observation Network for Earth (DataONE) NSF project. Ultimately, our 
longer-term plan is to integrate the provenance repository into the data 
preservation architecture currently being developed by DataONE. 
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The work of the Jisc Managing Research Data programme is—along with the 
rest of the UK higher education sector—taking place in an environment of 
increasing pressure on research funding. In order to justify the investment 
made by Jisc in this activity—and to help make the case more widely for the 
value of investing time and money in research data management—individual 
projects and the programme as a whole must be able to clearly express the 
resultant benefits to the host institutions and to the broader sector. This paper 
describes a structured approach to the measurement and description of 
benefits provided by the work of these projects for the benefit of funders, 
institutions and researchers. We outline the context of the programme and its 
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work; discuss the drivers and challenges of gathering evidence of benefits; 
specify benefits as distinct from aims and outputs; present emerging findings 
and the types of metrics and other evidence which projects have provided; 
explain the value of gathering evidence in a structured way to demonstrate 
benefits generated by work in this field; and share lessons learned from 
progress to date. 
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This paper will describe the efforts and findings of the JISC Data 
Management Skills Support Initiative ((DaMSSI'). DaMSSI was co-funded by 
the JISC Managing Research Data programme and the Research Information 
Network (RIN), in partnership with the Digital Curation Centre, to review, 
synthesise and augment the training offerings of the JISC Research Data 
Management Training Materials ("RDMTrain’) projects. 


DaMSSI tested the effectiveness of the Society of College, National and 
University Libraries' Seven Pillars of Information Literacy model (SCONUL, 
2011), and Vitae's Researcher Development Framework (‘Vitae RDF’) for 
consistently describing research data management ('RDM'’) skills and skills 
development paths in UK HEI postgraduate courses. 


With the collaboration of the RDMTrain projects, we mapped individual 
course modules to these two models and identified basic generic data 
management skills alongside discipline-specific requirements. A synthesis of 
the training outputs of the projects was then carried out, which further 
investigated the generic versus discipline-specific considerations and other 
successful approaches to training that had been identified as a result of the 
projects' work. In addition we produced a series of career profiles to help 
illustrate the fact that data management is an essential component—in 
obvious and not-so-obvious ways—of a wide range of professions. 


We found that both models had potential for consistently and coherently 
describing data management skills training and embedding this within broader 
institutional postgraduate curricula. However, we feel that additional 
discipline-specific references to data management skills could also be 
beneficial for effective use of these models. Our synthesis work identified that 
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the majority of core skills were generic across disciplines at the postgraduate 
level, with the discipline-specific approach showing its value in engaging the 
audience and providing context for the generic principles. 


Findings were fed back to SCONUL and Vitae to help in the refinement of 
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Results: Results indicate that UF's CTS researchers have diverse data 
management needs that are often specific to their discipline or current 
research project and span the data lifecycle. A common theme in responses 
was the need for consistent data management training, particularly for 
graduate students; this led to localized training within the Health Science 
Center and CTSI, as well as campus-wide training. Another campus-wide 
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This paper provides an overview of the elements required to create a 
sustainable research data management (RDM) service. The paper summarises 
key learning and lessons learnt from the University of Nottingham's project to 
create an RDM service for researchers. Collective experiences and learning 
from three key areas are covered, including: data management requirements 
gathering and validation, RDM training, and the creation of an RDM website. 
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Objective: This paper describes a project to revise an existing research data 


management (RDM) course to include instruction in computer skills with 
robust data science tools. 
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Setting: A Carnegie R1 university. 


Brief Description: Graduate student researchers need training in the basic 
concepts of RDM. However, they generally lack experience with robust data 
science tools to implement these concepts holistically. Two library instructors 
fundamentally redesigned an existing research RDM course to include 
instruction with such tools. The course was divided into lecture and lab 
sections to facilitate the increased instructional burden. Learning objectives 
and assessments were designed at a higher order to allow students to 
demonstrate that they not only understood course concepts but could use their 
computer skills to implement these concepts. 


Results: Twelve students completed the first iteration of the course. Feedback 
from these students was very positive, and they appreciated the combination 
of theoretical concepts, computer skills and hands-on activities. Based on 
student feedback, future iterations of the course will include more "flipped" 
content including video lectures and interactive computer tutorials to 
maximize active learning time in both lecture and lab. 
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Open access to data, as a core principle of open science, is predicated on 
assumptions that scientific data can be reused by other researchers. We test 
those assumptions by asking where scientists find reusable data, how they 
reuse those data, and how they interpret data they did not collect themselves. 
By conducting a qualitative meta-analysis of evidence on two long-term, 
distributed, interdisciplinary consortia, we found that scientists frequently 
sought data from public collections and from other researchers for 
comparative purposes such as "ground-truthing" and calibration. When they 
sought others' data for reanalysis or for combining with their own data, which 
was relatively rare, most preferred to collaborate with the data creators. We 
propose a typology of data reuses ranging from comparative to integrative. 
Comparative data reuse requires interactional expertise, which involves 
knowing enough about the data to assess their quality and value for a specific 
comparison such as calibrating an instrument in a lab experiment. Integrative 
reuse requires contributory expertise, which involves the ability to perform 
the action, such as reusing data in a new experiment. Data integration requires 
more specialized scientific knowledge and deeper levels of epistemic trust in 
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the knowledge products. Metadata, ontologies, and other forms of curation 
benefit interpretation for any kind of data reuse. Based on these findings, we 
theorize the data creators ' advantage, that those who create data have intimate 
and tacit knowledge that can be used as barter to form collaborations for 
mutual advantage. Data reuse is a process that occurs within knowledge 
infrastructures that evolve over time, encompassing expertise, trust, 
communities, technologies, policies, resources, and institutions. 


Pasquetto, Irene V., Bernadette M. Randles, and Christine L. Borgman. "On the 
Reuse of Scientific Data." Data Science Journal 16, no. 8 (2017). 
http://doi.org/10.5334/dsj-2017-008 


While science policy promotes data sharing and open data, these are not ends 
in themselves. Arguments for data sharing are to reproduce research, to make 
public assets available to the public, to leverage investments in research, and 
to advance research and innovation. To achieve these expected benefits of 
data sharing, data must actually be reused by others. Data sharing practices, 
especially motivations and incentives, have received far more study than has 
data reuse, perhaps because of the array of contested concepts on which reuse 
rests and the disparate contexts in which it occurs. Here we explicate concepts 
of data, sharing, and open data as a means to examine data reuse. We explore 
distinctions between use and reuse of data. Lastly we propose six research 
questions on data reuse worthy of pursuit by the community: How can uses of 
data be distinguished from reuses? When is reproducibility an essential goal? 
When is data integration an essential goal? What are the tradeoffs between 
collecting new data and reusing existing data? How do motivations for data 
collection influence the ability to reuse data? How do standards and formats 
for data release influence reuse opportunities? We conclude by summarizing 
the implications of these questions for science policy and for investments in 
data reuse. 
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88. https://doi.org/10.2218/ijdc.v311.43 
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Research Community: Process, Challenges and Lessons." International Journal of 
Digital Curation 7, no. 1 (2012): 151-162. https://doi.org/10.221 8/jdce.v7i1.222 


In 2009, the Institution for Social and Policy Studies (ISPS) at Yale 
University began building an open access digital collection of social science 
experimental data, metadata, and associated files produced by ISPS 
researchers. The digital repository was created to support the replication of 
research findings and to enable further data analysis and instruction. Content 
is submitted to a rigorous process of quality assessment and normalization, 
including transformation of statistical code into R, an open source statistical 
software. Other requirements included: (a) that the repository be integrated 
with the current database of publications and projects publicly available on 
the ISPS website; (b) that it offered open access to datasets, documentation, 
and statistical software program files; (c) that it utilized persistent linking 
services and redundant storage provided within the Yale Digital Commons 
infrastructure; and (d) that it operated in accordance with the prevailing 
standards of the digital preservation community. In partnership with Yale's 
Office of Digital Assets and Infrastructure (ODAJ), the ISPS Data Archive 
was launched in the fall of 2010. We describe the process of creating the 
repository, discuss prospects for similar projects in the future, and explain 
how this specialized repository fits into the larger digital landscape at Yale. 


Peer, Limor, Ann Green, and Elizabeth Stephenson. "Committing to Data Quality 
Review." International Journal of Digital Curation 9, no. 1 (2014): 263-291. 
https://doi.org/10.2218/ijdc.v9i1.317 


Amid the pressure and enthusiasm for researchers to share data, a rapidly 
growing number of tools and services have emerged. What do we know about 
the quality of these data? Why does quality matter? And who should be 
responsible for data quality? We believe an essential measure of data quality 
is the ability to engage in informed reuse, which requires that data are 
independently understandable. In practice, this means that data must undergo 
quality review, a process whereby data and associated files are assessed and 
required actions are taken to ensure files are independently understandable for 
informed reuse. This paper explains what we mean by data quality review, 
what measures can be applied to it, and how it is practiced in three domain- 
specific archives. We explore a selection of other data repositories in the 
research data ecosystem, as well as the roles of researchers, academic 
libraries, and scholarly journals in regard to their application of data quality 
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measures in practice. We end with thoughts about the need to commit to data 
quality and who might be able to take on those tasks. 


Peer, Limor, and Stephanie Wykstra. "New Curation Software: Step-by-Step 
Preparation of Social Science Data and Code for Publication and Preservation." 
IASSIST Quarterly 39, no. 4 (2015): 6-13. https://do1.org/10.29173/iq902 


PejSa, Stanislav, Shirley J. Dyke, and Thomas J. Hacker. "Building Infrastructure 
for Preservation and Publication of Earthquake Engineering Research Data." 
International Journal of Digital Curation 9, no. 2 (2014): 83-97. 
https://doi.org/10.2218/1jdc.v912.335 


The objective of this paper is to showcase the progress of the earthquake 
engineering community during a decade-long effort supported by the National 
Science Foundation in the George E. Brown Jr., Network for Earthquake 
Engineering Simulation (NEES). During the four years that NEES network 
operations have been headquartered at Purdue University, the NEEScomm 
management team has facilitated an unprecedented cultural change in the 
ways research is performed in earthquake engineering. NEES has not only 
played a major role in advancing the cyberinfrastructure required for 
transformative engineering research, but NEES research outcomes are making 
an impact by contributing to safer structures throughout the USA and abroad. 
This paper reflects on some of the developments and initiatives that helped 
instil change in the ways that the earthquake engineering and tsunami 
community share and reuse data and collaborate in general. 


Peng, Ge. "The State of Assessing Data Stewardship Maturity—An Overview." 
Data Science Journal 17 (2018): 7. http://doi.org/10.5334/dsj-201 8-007 


Data stewardship encompasses all activities that preserve and improve the 
information content, accessibility, and usability of data and metadata. Recent 
regulations, mandates, policies, and guidelines set forth by the U.S. 
government, federal other, and funding agencies, scientific societies and 
scholarly publishers, have levied stewardship requirements on digital 
scientific data. This elevated level of requirements has increased the need for 
a formal approach to stewardship activities that supports compliance 
verification and reporting. Meeting or verifying compliance with stewardship 
requirements requires assessing the current state, identifying gaps, and, if 
necessary, defining a roadmap for improvement. This, however, touches on 
standards and best practices in multiple knowledge domains. Therefore, data 
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stewardship practitioners, especially these at data repositories or data service 
centers or associated with data stewardship programs, can benefit from 
knowledge of existing maturity assessment models. This article provides an 
overview of the current state of assessing stewardship maturity for federally 
funded digital scientific data. A brief description of existing maturity 
assessment models and related application(s) is provided. This helps 
stewardship practitioners to readily obtain basic information about these 
models. It allows them to evaluate each model's suitability for their unique 
verification and improvement needs. 


Peng, Ge, Anna Milan, Nancy A. Ritchey, Robert P. Partee H, Sonny Zinn, Evan 
McQuinn, Kenneth S. Casey, Paul Lemieux III, Raisa Ionin, Philip Jones, Arianna 
Jakositz, and Donald Collins. "Practical Application of a Data Stewardship 
Maturity Matrix for the NOAA OneStop Project." Data Science Journal, 18, no. 1 
(2019): 41. http://doi.org/10.5334/dsj-2019-041 


Assessing the stewardship maturity of individual datasets is an essential part 
of ensuring and improving the way datasets are documented, preserved, and 
disseminated to users. It is a critical step towards meeting U.S. federal 
regulations, organizational requirements, and user needs. However, it is 
challenging to do so consistently and quantifiably. The Data Stewardship 
Maturity Matrix (DSMM), developed jointly by NOAA's National Centers for 
Environmental Information (NCED) and the Cooperative Institute for Climate 
and Satellites—North Carolina (CICS-NC), provides a uniform framework for 
consistently rating stewardship maturity of individual datasets in nine key 
components: preservability, accessibility, usability, production sustainability, 
data quality assurance, data quality control/monitoring, data quality 
assessment, transparency/traceability, and data integrity. So far, the DSMM 
has been applied to over 800 individual datasets that are archived and/or 
managed by NCEI, in support of the NOAA’s OneStop Data Discovery and 
Access Framework Project. As a part of the OneStop-ready process, tools, 
implementation guidance, workflows, and best practices are developed to 
assist the application of the DSMM and described in this paper. The DSMM 
ratings are also consistently captured in the ISO standard-based dataset-level 
quality metadata and citable quality descriptive information documents, 
which serve as interoperable quality information to both machine and human 
end-users. These DSMM implementation and integration workflows and best 
practices could be adopted by other data management and stewardship 
projects or adapted for applications of other maturity assessment models. 
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We analyze data sharing practices of astronomers over the past fifteen years. 
An analysis of URL links embedded in papers published by the American 
Astronomical Society reveals that the total number of links included in the 
literature rose dramatically from 1997 until 2005, when it leveled off at 
around 1500 per year. The analysis also shows that the availability of linked 
material decays with time: in 2011, 44% of links published a decade earlier, 
in 2001, were broken. A rough analysis of link types reveals that links to data 
hosted on astronomers’ personal websites become unreachable much faster 
than links to datasets on curated institutional sites. To gauge astronomers' 
current data sharing practices and preferences further, we performed in-depth 
interviews with 12 scientists and online surveys with 173 scientists, all at a 
large astrophysical research institute in the United States: the Harvard- 
Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth 
interviews and the online survey indicate that, in principle, there is no 
philosophical objection to data-sharing among astronomers at this institution. 
Key reasons that more data are not presently shared more efficiently in 
astronomy include: the difficulty of sharing large data sets; over reliance on 
non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); 
unfamiliarity with options that make data-sharing easier (faster) and/or more 
robust; and, lastly, a sense that other researchers would not want the data to 
be shared. We conclude with a short discussion of a new effort to implement 
an easy-to-use, robust, system for data sharing in astronomy, at 
theastrodata.org, and we analyze the uptake of that system to-date. 
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159 


Pepler, Sam, and Sarah Callaghan. "Twenty Years of Data Management in the 
British Atmospheric Data Centre." International Journal of Digital Curation 10, 
no. 2 (2015): 23-32. https://doi.org/10.2218/1jdc.v1012.379 


The British Atmospheric Data Centre (BADC) has existed in its present form 
for 20 years, having been formally created in 1994. It evolved from the GDF 
(Geophysical Data Facility), a SERC (Science and Engineering Research 
Council) facility, as a result of research council reform where NERC (Natural 
Environment Research Council) extended its remit to cover atmospheric data 
below 10km altitude. With that change the BADC took on data from many 
other atmospheric sources and started interacting with NERC research 
programmes. 


The BADC has now hit early adulthood. Prompted by this milestone, we 
examine in this paper whether the data centre is creaking at the seams or is 
looking forward to the prime of its life, gliding effortlessly into the future. 
Which parts of it are bullet proof and which parts are held together with 
double-sided sticky tape? Can we expect to see it in its present form in 
another twenty years' time? 


To answer these questions, we examine the interfaces, technology, processes 
and organisation used in the provision of data centre services by looking at 
three snapshots in time, 1994, 2004 and 2014, using metrics and reports from 
the time to compare and contrasts the services using BADC. The repository 
landscape has changed massively over this period and has moved the focus 
for technology and development as the broader community followed 
emerging trends, standards and ways of working. The incorporation of these 
new ideas has been both a blessing and a curse, providing the data centre staff 
with plenty of challenges and opportunities. 


We also discuss key data centre functions including: data discovery, data 
access, ingestion, data management planning, preservation plans, 
agreements/licences and data policy, storage and server technology, 
organisation and funding, and user management. We conclude that the data 
centre will probably still exist in some form in 2024 and that it will most 
likely still be reliant on a file system. However, the technology delivering this 
service will change and the host organisation and funding routes may vary. 
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Data Experts around Data Management Planning." Data Science Journal, 18, no. 1 
(2019): 59. http://doi.org/10.5334/dsj-2019-059 


The Data Stewardship Wizard is a tool for data management planning that is 
focused on getting the most value out of data management planning for the 
project itself rather than on fulfilling obligations. It is based on FAIR Data 
Stewardship, in which each data-related decision in a project acts to optimize 
the Findability, Accessibility, Interoperability and/or Reusability of the data. 
The background to this philosophy is that the first reuser of the data is the 
researcher themselves. The tool encourages the consulting of expertise and 
experts, can help researchers avoid risks they did not know they would 
encounter by confronting them with practical experience from others, and can 
help them discover helpful technologies they did not know existed. 


In this paper, we discuss the context and motivation for the tool, we explain 
its architecture and we present key functions, such as the knowledge model 
evolvability and migrations, assembling data management plans, metrics and 
evaluation of data management plans. 
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MacDonald. "Research Data Management in Academic Institutions: A Scoping 
Review." PLOS ONE 12, no. 5 (2017): e0178261. 
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Objective 

The purpose of this study is to describe the volume, topics, and 
methodological nature of the existing research literature on research data 
management in academic institutions. 

Materials and methods 

We conducted a scoping review by searching forty literature databases 
encompassing a broad range of disciplines from inception to April 2016. We 
included all study types and data extracted on study design, discipline, data 


collection tools, and phase of the research data lifecycle. 


Results 
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We included 301 articles plus 10 companion reports after screening 13,002 
titles and abstracts and 654 full-text articles. Most articles (85%) were 
published from 2010 onwards and conducted within the sciences (86%). More 
than three-quarters of the articles (78%) reported methods that included 
interviews, cross-sectional, or case studies. Most articles (68%) included the 
Giving Access to Data phase of the UK Data Archive Research Data 
Lifecycle that examines activities such as sharing data. When studies were 
grouped into five dominant groupings (Stakeholder, Data, Library, 
Tool/Device, and Publication), data quality emerged as an integral element. 


Conclusion 


Most studies relied on self-reports (interviews, surveys) or accounts from an 
observer (case studies) and we found few studies that collected empirical 
evidence on activities amongst data producers, particularly those examining 
the impact of research data management interventions. As well, fewer studies 
examined research data management at the early phases of research projects. 
The quality of all research outputs needs attention, from the application of 
best practices in research data management studies, to data producers 
depositing data in repositories for long-term use. 
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A Case Study." Data Science Journal, 18, no. 1 (2019): 43. 
http://doi.org/10.5334/dsj-2019-043 


We present a joint effort at Virginia Tech between a research group in the 
Department of Fish and Wildlife Conservation and Data Services in the 
University Libraries to improve data management for long-term ecological 
field research projects in the Florida Panhandle. Consultative research data 
management support from Data Services in the University Libraries played an 
integral role in the development of the training curriculum. Emphasizing the 
importance of data quality to the field workers at the beginning of this 
training curriculum was a vital part of its success. Also critical for success 
was the research group 's investment of time and effort to work with field 
workers and improve data management systems. We compare this case study 
to three others in the literature to compare and contrast data management 
processes and procedures. This case study serves as one example of how 
targeted training and efforts in data and project management for a research 
project can lead to substantial improvements in research data quality. 
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Adjusting a Repository Framework." International Journal of Digital Curation 12, 
no. 2 (2017): 234-245. https://doi.org/10.2218/ijdc.v 1212.57 1 


Handling heterogeneous data, subject to minimal costs, can be perceived as a 
classic management problem. The approach at hand applies established 
managerial theorizing to the field of data curation. It is argued, however, that 
data curation cannot merely be treated as a standard case of applying 
management theory in a traditional sense. Rather, the practice of curating 
humanities research data, the specifications and adjustments of the model 
suggested here reveal an intertwined process, in which knowledge of both 
strategic management and solid information technology have to be 
considered. Thus, suggestions on the strategic positioning of research data, 
which can be used as an analytical tool to understand the proposed workflow 
mechanisms, and the definition of workflow modules, which can be flexibly 
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used in designing new standard workflows to configure research data 
repositories, are put forward. 


Pienta, Amy M., Dharma Akmon, Justin Noble, Lynette Hoelter, and Susan 
Jekielek. "A Data-Driven Approach to Appraisal and Selection at a Domain Data 
Repository." International Journal of Digital Curation 12, no. 2 (2017): 
https://do1.org/10.2218/ijdce.v12i2.500. 


Social scientists are producing an ever-expanding volume of data, leading to 
questions about appraisal and selection of content given finite resources to 
process data for reuse. We analyze users' search activity in an established 
social science data repository to better understand demand for data and more 
effectively guide collection development. By applying a data-driven 
approach, we aim to ensure curation resources are applied to make the most 
valuable data findable, understandable, accessible, and usable. We analyze 
data from a domain repository for the social sciences that includes over 
500,000 annual searches in 2014 and 2015 to better understand trends in user 
search behavior. Using a newly created search-to-study ratio technique, we 
identified gaps in the domain data repository's holdings and leveraged this 
analysis to inform our collection and curation practices and policies. The 
evaluative technique we propose in this paper will serve as a baseline for 
future studies looking at trends in user demand over time at the domain data 
repository being studied with broader implications for other data repositories. 


Pinfield,Stephen, Andrew M. Cox, and Jen Smith. "Research Data Management 
and Libraries: Relationships, Activities, Drivers and Influences." PLoS ONE 9, no. 
12 (2014): e114734. https://doi.org/10.1371/journal.pone.0114734 


The management of research data is now a major challenge for research 
organisations. Vast quantities of born-digital data are being produced in a 
wide variety of forms at a rapid rate in universities. This paper analyses the 
contribution of academic libraries to research data management (RDM) in the 
wider institutional context. In particular it: examines the roles and 
relationships involved in RDM, identifies the main components of an RDM 
programme, evaluates the major drivers for RDM activities, and analyses the 
key factors influencing the shape of RDM developments. The study is written 
from the perspective of library professionals, analysing data from 26 semi- 
structured interviews of library staff from different UK institutions. This is an 
early qualitative contribution to the topic complementing existing quantitative 
and case study approaches. Results show that although libraries are playing a 
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significant role in RDM, there is uncertainty and variation in the relationship 
with other stakeholders such as IT services and research support offices. 
Current emphases in RDM programmes are on developments of policies and 
guidelines, with some early work on technology infrastructures and support 
services. Drivers for developments include storage, security, quality, 
compliance, preservation, and sharing with libraries associated most closely 
with the last three. The paper also highlights a ‘jurisdictional’ driver in which 
libraries are claiming a role in this space. A wide range of factors, including 
governance, resourcing and skills, are identified as influencing ongoing 
developments. From the analysis, a model is constructed designed to capture 
the main aspects of an institutional RDM programme. This model helps to 
clarify the different issues involved in RDM, identifying layers of activity, 
multiple stakeholders and drivers, and a large number of factors influencing 
the implementation of any initiative. Institutions may usefully benchmark 
their activities against the data and model in order to inform ongoing RDM 
activity. 
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In common with many global research funding agencies, in 2011 the UK 
Engineering and Physical Sciences Research Council (EPSRC) published its 
Policy Framework on Research Data along with a mandate that institutions be 
fully compliant with the policy by May 2015. The University of Bath has a 
strong applied science and engineering research focus and, as such, the 
EPSRC is a major funder of the university's research. In this paper, the Jisc- 
funded Research360 project shares its experience in developing the 
infrastructure required to enable a research-intensive institution to achieve full 
compliance with a particular funder's policy, in such a way as to support the 
varied data management needs of both the University of Bath and its external 
stakeholders. A key feature of the Research360 project was to ensure that 
after the project's completion in summer 2013 the newly developed data 
management infrastructure would be maintained up to and beyond the 
EPSRC's 2015 deadline. Central to these plans was the 'University of Bath 
Roadmap for EPSRC’, which was identified as an exemplar response by the 
EPSRC. This paper explores how a roadmap designed to meet a single 
funder's requirements can be compatible with the strategic goals of an 
institution. Also discussed is how the project worked with Charles Beagrie 
Ltd to develop a supporting business case, thus ensuring implementation of 
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these long-term objectives. This paper describes how two new data 
management roles, the Institutional Data Scientist and Technical Data 
Coordinator, have contributed to delivery of the Research360 project and the 
importance of these new types of cross-institutional roles for embedding a 
new data management infrastructure within an institution. Finally, the 
experience of developing a new institutional data policy is shared. This policy 
represents a particular example of the need to reconcile a funder's 
expectations with the needs of individual researchers and their collaborators. 
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Many initiatives encourage investigators to share their raw datasets in hopes 
of increasing research efficiency and quality. Despite these investments of 
time and money, we do not have a firm grasp of who openly shares raw 
research data, who doesn't, and which initiatives are correlated with high rates 
of data sharing. In this analysis I use bibliometric methods to identify patterns 
in the frequency with which investigators openly archive their raw gene 
expression microarray datasets after study publication. 


Automated methods identified 11,603 articles published between 2000 and 
2009 that describe the creation of gene expression microarray data. 
Associated datasets in best-practice repositories were found for 25% of these 
articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. 
Accounting for sensitivity of the automated methods, approximately 45% of 
recent gene expression studies made their data publicly available. 


First-order factor analysis on 124 diverse bibliometric attributes of the data 
creation articles revealed 15 factors describing authorship, funding, 
institution, publication, and domain environments. In multivariate regression, 
authors were most likely to share data if they had prior experience sharing or 
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reusing data, if their study was published in an open access journal or a 
journal with a relatively strong data sharing policy, or if the study was funded 
by a large number of NIH grants. Authors of studies on cancer and human 
subjects were least likely to make their datasets available. 


These results suggest research data sharing levels are still low and increasing 
only slowly, and data is least available in areas where it could make the 
biggest impact. Let's learn from those with high rates of sharing to embrace 
the full potential of our research output. 
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Background 


Sharing research data provides benefit to the general scientific community, 
but the benefit is less obvious for the investigator who makes his or her data 
available. 


Principal Findings 


We examined the citation history of 85 cancer microarray clinical trial 
publications with respect to the availability of their data. The 48% of trials 
with publicly available microarray data received 85% of the aggregate 
citations. Publicly available data was significantly (p=0.006) associated with a 
69% increase in citations, independently of journal impact factor, date of 
publication, and author country of origin using linear regression. 


Significance 
This correlation between publicly available data and increased literature 


impact may further motivate investigators to share their detailed research 
data. 
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Background: Librarians developed a pilot program to provide training, 
resources, strategies, and support for medical libraries seeking to establish 
research data management (RDM) services. Participants were required to 
complete eight educational modules to provide the necessary background in 
RDM. Each participating institution was then required to use two of the 
following three elements: (1) a template and strategies for data interviews, (2) 
the Teaching Toolkit to teach an introductory RDM class, or (3) strategies for 
hosting a data class series. 


Case Presentation: Six libraries participated in the pilot, with between two 
and eight librarians participating from each institution. Librarians from each 
institution completed the online training modules. Each institution conducted 
between six and fifteen data interviews, which helped build connections with 
researchers, and taught between one and five introductory RDM classes. All 
classes received very positive evaluations from attendees. Two libraries 
conducted a data series, with one bringing in instructors from outside the 
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INTRODUCTION Data has moved into the spotlight as an important 
scholarly output that should be shared with the scientific community for 
replication and re-use in new contexts. This has a direct impact on libraries, 
archives, and other service providers in the data curation and access 
landscape. DESCRIPTION OF PROJECT The GESIS Data Archive for the 
Social Sciences (DAS) has been curating and disseminating social science 
research data since 1960. The article presents tools, services, and strategies 
developed by the DAS to support the research community in adequately 
responding to the legal, ethical, and practical challenges that the 
transformation towards data-centric, open science presents. These include 
GESIS's Secure Data Center, the data publication platform "datorium" and a 
recent project to create a georeferencing service for survey data. LESSONS 
LEARNED The experiences gained through these activities show that getting 
involved-now, rather than further down the road-pays off in that it allows 
service providers to actively shape the ongoing transformation. At the same 
time, by cooperating with suitable partners, the effort and investment of 
resources can be kept at a manageable level for individual organizations. 
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This paper discusses work to implement the University of Edinburgh 
Research Data Management (RDM) policy by developing the services needed 
to support researchers and fulfil obligations within a changing national and 
international setting. This is framed by an evolving Research Data 
Management Roadmap and includes a governance model that ensures 
cooperation amongst Information Services (IS) managers and oversight by an 
academic-led steering group. IS has taken requirements from research groups 
and IT professionals, and at the request of the steering group has conducted 
pilot work involving volunteer research units within the three colleges to 
develop functionality and presentation for the key services. The first pilots 
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report on the plans, achievements and challenges encountered while we 
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Many large research universities provide research data management (RDM) 
support services for researchers. These may include support for data 
management planning, best practices (e.g., organization, support, and 
storage), archiving, sharing, and publication. However, these data-focused 
services may under-emphasize the importance of the software that is created 
to analyse said data. This is problematic for several reasons. First, because 
software is an integral part of research across all disciplines, it undermines the 
ability of said research to be understood, verified, and reused by others (and 
perhaps even the researcher themselves). Second, it may result in less 
visibility and credit for those involved in creating the software. A third reason 
is related to stewardship: if there is no clear process for how, when, and 
where the software associated with research can be accessed and who will be 
responsible for maintaining such access, important details of the research may 
be lost over time. 


This article presents the process by which the RDM services unit of a large 
research university addressed the lack of emphasis on software and source 
code in their existing service offerings. The greatest challenges were related 
to the need to incorporate software into existing data-oriented service 
workflows while minimizing additional resources required, and the nascent 
state of software curation and archiving in a data management context. The 
problem was addressed from four directions: building an understanding of 
software curation and preservation from various viewpoints (e.g., video 
games, software engineering), building a conceptual model of software 
preservation to guide service decisions, implementing software-related 
services, and documenting and evaluating the work to build expertise and 
establish a standard service level. 
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Participation." PLOS Biology 12, no. 1 (2014): e1001779. 
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An increasing number of publishers and funding agencies require public data 
archiving (PDA) in open-access databases. PDA has obvious group benefits 
for the scientific community, but many researchers are reluctant to share their 
data publicly because of real or perceived individual costs. Improving 
participation in PDA will require lowering costs and/or in-creasing benefits 
for primary data collectors. Small, simple changes can enhance existing 
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measures to ensure that more scientific data are properly archived and made 
publicly available: (1) facilitate more flexible embargoes on archived data, (2) 
encourage communication between data generators and re-users, (3) disclose 
data re-use ethics, and (4) encourage increased recognition of publicly 
archived data. 


Rousidis, Dimitris, Emmanouel Garoufallou, Panos Balatsoukas, and Miguel- 
Angel Sicilia. "Metadata for Big Data: A Preliminary Investigation of Metadata 
Quality Issues in Research Data Repositories." Information Services & Use 34, no. 
3-4 (2014): 279-286. https://doi.org/10.3233/isu- 140746 
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https://do1.org/10.1186/s41073-017-0028-9 


Background 


The foundation of health and medical research is data. Data sharing facilitates 
the progress of research and strengthens science. Data sharing in research is 
widely discussed in the literature; however, there are seemingly no evidence- 
based incentives that promote data sharing. 


Methods 


A systematic review (registration: doi.org/10.17605/OSF.1O/6PZSE) of the 
health and medical research literature was used to uncover any evidence- 
based incentives, with pre- and post-empirical data that examined data sharing 
rates. We were also interested in quantifying and classifying the number of 
opinion pieces on the importance of incentives, the number observational 
studies that analysed data sharing rates and practices, and strategies aimed at 
increasing data sharing rates. 


Results 


Only one incentive (using open data badges) has been tested in health and 
medical research that examined data sharing rates. The number of opinion 
pieces (n=85) out-weighed the number of article-testing strategies (n=76), and 
the number of observational studies exceeded them both (n=106). 


Conclusions 
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Given that data is the foundation of evidence-based health and medical 
research, it is paradoxical that there is only one evidence-based incentive to 
promote data sharing. More well-designed studies are needed in order to 
increase the currently low rates of data sharing. 


Rueda, Laura, Martin Fenner, and Patricia Cruse. "DataCite: Lessons Learned on 
Persistent Identifiers for Research Data." International Journal of Digital Curation 
11, no. 2 (2017): 39-47. https://do1.org/10.2218/ijdc.v1 112.421 


Data are the infrastructure of science and they serve as the groundwork for 
scientific pursuits. Data publication has emerged as a game-changing 
breakthrough in scholarly communication. Data form the outputs of research 
but also are a gateway to new hypotheses, enabling new scientific insights and 
driving innovation. And yet stakeholders across the scholarly ecosystem, 
including practitioners, institutions, and funders of scientific research are 
increasingly concerned about the lack of sharing and reuse of research data. 
Across disciplines and countries, researchers, funders, and publishers are 
pushing for a more effective research environment, minimizing the 
duplication of work and maximizing the interaction between researchers. 
Availability, discoverability, and reproducibility of research outputs are key 
factors to support data reuse and make possible this new environment of 
highly collaborative research. 


An interoperable e-infrastructure is imperative in order to develop new 
platforms and services for to data publication and reuse. DataCite has been 
working to establish and promote methods to locate, identify and share 
information about research data. Along with service development, DataCite 
supports and advocates for the standards behind persistent identifiers (in 
particular DOIs, Digital Object Identifiers) for data and other research 
outputs. Persistent identifiers allow different platforms to exchange 
information consistently and unambiguously and provide a reliable way to 
track citations and reuse. Because of this, data publication can become a 
reality from a technical standpoint, but the adoption of data publication and 
data citation as a practice by researchers is still in its early stages. 


Since 2009, DataCite has been developing a series of tools and services to 
foster the adoption of data publication and citation among the research 
community. Through the years, DataCite has worked in a close collaboration 
with interdisciplinary partners on these issues and we have gained insight into 
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the development of data publication workflows. This paper describes the 
types of different actions and the lessons learned by DataCite. 
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(2013): 205-214. https://doi.org/10.2218/ijdc.v8i2.284 


The University of Oxford is preparing systems and services to enable 
members of the university to manage research data produced by its scholars. 
Much of the work has been carried out under the Jisc-funded Damaro project. 
This project draws together existing nascent services, adds new systems and 
services to ‘fill the gaps' and provides a wide-ranging infrastructure. 
Development comprises four parallel strands: endorsement of a university 
research data management policy; training and guidance in research data 
management; technical infrastructure; and future sustainability. A key 
element of the technical infrastructure is DataFinder, a catalogue of Oxford 
research data outputs. DataFinder's core purposes are to record the existence 
of Oxford datasets, enable their discovery, and provide details of their 
location. DataFinder will record metadata about Oxford research data, 
irrespective of location, discipline or format, and is viewed by the university 
as a crucial hub for the university's Research Data Management (RDM) 
infrastructure. 





. "DataFinder: A Research Data Catalogue for Oxford." Ariadne, no. 71 
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Strategies towards a Shared Goal." International Journal of Digital Curation 7, no. 
2 (2012): 123-129. https://doi.org/10.2218/ijde.v7i2.235 


This paper provides a comparative discussion of the strategies employed in 
the UK's DMP Online tool and the US's DMPTool, both designed to provide a 
structured environment for research data management planning (DMP) with 
explicit links to funder requirements. Following the Sixth International 
Digital Curation Conference, held in Chicago in December 2010, a number of 
US institutions partnered with the Digital Curation Centre's DMP Online team 
to learn from their experiences while developing a US counterpart. DMPTool 
arrived in beta in August 2011 and released a production version in 
November 2011. This joint paper will compare and contrast use cases, 
organizational and national/cultural characteristics that have influenced the 
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development decisions, outcomes achieved so far, and planned future 
developments. 
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Calm and Fill in Your DMP: Lessons Learnt from a Swiss DMP-Template 
Initiative." International Journal of Digital Curation 13, no. 1 (2018): 215-222. 
https://doi.org/10.2218/ijdc.v1311.617 


Aligning with other funders such as Horizon 2020, the Swiss National 
Science Foundation (SNSF) requires researchers who apply for project 
funding to provide a Data Management Plan (DMP) as an integral part of 
their research proposal. In an attempt to assist and guide researchers filling 
out this document, and to provide a service as efficient as possible, the 
libraries of the Ecole Polytechnique Fédérale de Lausanne (EPFL) and ETH 
Zurich took the lead to elaborate on a DMP template with content suggestions 
and recommendations. In this practice paper, we will describe the 
collaborative effort between the two Swiss federal institutes of technology, 
namely EPFL and ETH Zurich, as well as some partners of the national Data 
Life Cycle Management (DLCM) project, which resulted in a very helpful 
document as reported by our researchers. 
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98-110. https://doi.org/10.2218/ijdc.v9i2.336 


This article reports on the transfer of a massive scientific dataset from a 
national laboratory to a university library, and from one kind of workforce to 
another. We use the transfer of the Sloan Digital Sky Survey (SDSS) archive 
to examine the emergence of a new workforce for scientific research data 
management. Many individuals with diverse educational backgrounds and 
domain experience are involved in SDSS data management: domain 
scientists, computer scientists, software and systems engineers, programmers, 
and librarians. These types of positions have been described using terms such 
as research technologist, data scientist, e-science professional, data curator, 
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and more. The findings reported here are based on semi-structured interviews, 
ethnographic participant observation, and archival studies from 2011-2013. 


The library staff conducting the data storage and archiving of the SDSS 
archive faced two performance problems. The preservation specialist and the 
system administrator worked together closely to discover and implement 
solutions to the slow data transfer and verification processes. The team 
overcame these slow-downs by problem solving, working in a team, and 
writing code. The library team lacked the astronomy domain knowledge 
necessary to meet some of their preservation and curation goals. 


The case study reveals the variety of expertise, experience, and individuals 
essential to the SDSS data management process. A variety of backgrounds 
and educational histories emerge in the data managers studied. Teamwork is 
necessary to bring disparate expertise together, especially between those with 
technical and domain education. The findings have implications for data 
management education, policy and relevant stakeholders. 


This article is part of continuing research on Knowledge Infrastructures. 
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Case Study." Journal of eScience Librarianship 4, no. 1 (2015): e1076. 
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Background 


Many journals now require authors share their data with other investigators, 
either by depositing the data in a public repository or making it freely 
available upon request. These policies are explicit, but remain largely 
untested. We sought to determine how well authors comply with such policies 
by requesting data from authors who had published in one of two journals 
with clear data sharing policies. 
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Methods and Findings 


We requested data from ten investigators who had published in either PLoS 
Medicine or PLoS Clinical Trials. All responses were carefully documented. 
In the event that we were refused data, we reminded authors of the journal's 
data sharing guidelines. If we did not receive a response to our initial request, 
a second request was made. Following the ten requests for raw data, three 
investigators did not respond, four authors responded and refused to share 
their data, two email addresses were no longer valid, and one author requested 
further details. A reminder of PLoS's explicit requirement that authors share 
data did not change the reply from the four authors who initially refused. 

Only one author sent an original data set. 


Conclusions 


We received only one of ten raw data sets requested. This suggests that 
journal policies requiring data sharing do not lead to authors making their 
data sets available to independent investigators. 


Savage, James L., and Lauren Cadwallader. "Establishing, Developing, and 
Sustaining a Community of Data Champions." Data Science Journal, 18, no. 1 
(2019): 23. http://doi.org/10.5334/dsj-2019-023 


Supporting good practice in Research Data Management (RDM) is 
challenging for higher education institutions, in part because of the diversity 
of research practices and data types across disciplines. While centralised 
research data support units now exist in many universities, these typically 
possess neither the discipline-specific expertise nor the resources to offer 
appropriate targeted training and support within every academic unit. One 
solution to this problem is to identify suitable individuals with discipline- 
specific expertise that are already embedded within each unit, and empower 
these individuals to advocate for good RDM and to deliver support locally. 
This article focuses on an ongoing example of this approach: the Data 
Champion Programme at the University of Cambridge, UK. We describe how 
the Data Champion programme was established; the programme's reach, 
impact, strengths and weaknesses after two years of operation; and our 
anticipated challenges and planned strategies for maintaining the programme 
over the medium- and long-term. 


Sayogoa, Djoko Sigit, and Theresa A. Pard. "Exploring the Determinants of 
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pillar Principle Towards Data FAIRness." Data Science Journal, 18, no. 1 (2019): 
6. http://doi.org/10.5334/dsj-2019-006 


Research Data Management at Bielefeld University is considered as a cross- 
cutting task among central facilities and research groups at the faculties. 
While initially started as project "Bielefeld Data Informium” lasting over 
seven years (2010-2015), it is now being expanded by setting up a 
Competence Center for Research Data. The evolution of the institutional 
RDM is based on the three-pillar principle: 1. Policies, 2. Technical 
infrastructure and 3. Support structures. The problem of data quality and the 
issues with reproducibility of research data is addressed in the project 
Conquaire. It is creating an infrastructure for the processing and versioning of 
research data which will finally allow publishing of research data in the 
institutional repository. Conquaire extends the existing RDM infrastructure in 
three ways: with a Collaborative Platform, Data Quality Checking, and 
Reproducible Research. 
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11, no. 1 (2016): e0146695. http://dx.doi.org/10.1371/journal.pone.0146695 
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This paper presents the findings of the Belmont Forum's survey on Open Data 
which targeted the global environmental research and data infrastructure 
community. It highlights users' perceptions of the term "open data", 
expectations of infrastructure functionalities, and barriers and enablers for the 
sharing of data. A wide range of good practice examples was pointed out by 
the respondents which demonstrates a substantial uptake of data sharing 
through e-infrastructures and a further need for enhancement and 
consolidation. Among all policy responses, funder policies seem to be the 
most important motivator. This supports the conclusion that stronger 
mandates will strengthen the case for data sharing. 
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The paper presents results from a campus-wide survey at the University of 
Lille (France) on research data management in social sciences and 
humanities. The survey received 270 responses, equivalent to 15% of the 
whole sample of scientists, scholars, PhD students, administrative and 
technical staff (research management, technical support services); all 
disciplines were represented. The responses show a wide variety of practice 
and usage. The results are discussed regarding job status and disciplines and 
compared to other surveys. Four groups can be distinguished, i.e. pioneers 
(20-25%), motivated (25-30%), unaware (30%) and reluctant (5-10%). 
Finally, the next steps to improve the research data management on the 
campus are presented. 
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In this paper, we discuss the various stages of the institution-wide project that 


lead to the adoption of the data management policy at Leiden University in 
2016. We illustrate this process by highlighting how we have involved all 
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stakeholders. Each organisational unit was represented in the project teams. 
Results were discussed in a sounding board with both academic and support 
staff. Senior researchers acted as pioneers and raised awareness and 
commitment among their peers. By way of example, we present pilot projects 
from two faculties. We then describe the comprehensive implementation 
programme that will create facilities and services that must allow 
implementing the policy as well as monitoring and evaluating it. Finally, we 
will present lessons learnt and steps ahead. The engagement of all 
stakeholders, as well as explicit commitment from the Executive Board, has 
been an important key factor for the success of the project and will continue 
to be an important condition for the steps ahead. 
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SUPER, a Study of User Priorities for e-infrastructure for Research, was a 
six-month effort funded by the UK e-Science Core Programme and JISC. Its 
aim was to inform investment in order to provide a usable, useful, and 
accessible e-infrastructure for all researchers and a coherent set of e- 
infrastructure services that would increase usage by at least a factor of ten by 
2010. Through a series of unstructured face-to-face interviews with over 45 
participants from 30 different projects, an online survey, together with a day- 
long workshop at NeSC, we have observed recurring issues relating to the 
provision of e-infrastructure. In this article we focus on the data-related issues 
identified during these interactions. We conclude with a prioritised list of 
future activities for research, development, and adoption in the data space. 
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A study of 56 professors at five American universities found that a majority 
had little understanding of principles, well-known in the field of data curation, 
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informing the ongoing administration of digital materials and chose to 
manage and store work-related data by relying on the use of their own storage 
devices and cloud accounts. It also found that a majority of them had 
experienced the loss of at least one work-related digital object that they 
considered to be important in the course of their professional career. Despite 
such a rate of loss, a majority of respondents expressed at least a moderate 
level of confidence that they would be able to make use of their digital objects 
in 25 years. The data suggest that many faculty members are unaware that 
their data is at risk. They also indicate a strong correlation between faculty 
members' digital object loss and their data management practices. University 
professors producing digital objects can help themselves by becoming aware 
that these materials are subject to loss. They can also benefit from awareness 
and use of better personal data management practices, as well as participation 
in university-level programmatic digital curation efforts and the availability of 
more readily accessible, robust infrastructure for the storage of digital 
materials. 
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This paper describes initial experiences in evaluating an established data 
archive with a long-standing commitment to preservation and dissemination 
of social science research data against recently formulated standards for 
trustworthy digital archives. As stakeholders need to be sure that the data they 
produce, use or fund is treated according to common standards, the GESIS 
Data Archive decided to start a process of audit and certification within the 
European Framework of Certification and Audit, starting with the Data Seal 
of Approval (DSA). This paper gives an overview of workflows within the 
archive and illustrates some of the steps necessary to obtain the DSA as well 
as to optimize some of its services. Finally, a short appraisal of the method of 
the DSA is made. 
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This paper presents the findings, lessons learned and next steps associated 
with the implementation of the immersiveInformatics pilot: a distinctive 
research data management (RDM) training programme designed in 
collaboration between UKOLN Informatics and the Library at the University 
of Melbourne, Australia. The pilot aimed to equip a broad range of academic 
and professional staff roles with RDM skills as a key element of capacity and 
capability building within a single institution. 
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Traditionally, the formal scientific output in most fields of natural science has 
been limited to peer-reviewed academic journal publications, with less 
attention paid to the chain of intermediate data results and their associated 
metadata, including provenance. In effect, this has constrained the 
representation and verification of the data provenance to the confines of the 
related publications. Detailed knowledge of a dataset's provenance is essential 
to establish the pedigree of the data for its effective re-use, and to avoid 
redundant re-enactment of the experiment or computation involved. It is 
increasingly important for open-access data to determine their authenticity 
and quality, especially considering the growing volumes of datasets appearing 
in the public domain. To address these issues, we present an approach that 
combines the Digital Object Identifier (DOI)—a widely adopted citation 
technique—with existing, widely adopted climate science data standards to 
formally publish detailed provenance of a climate research dataset as an 
associated scientific workflow. This is integrated with linked-data compliant 
data re-use standards (e.g. OAI-ORE) to enable a seamless link between a 
publication and the complete trail of lineage of the corresponding dataset, 
including the dataset itself. 
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them with service providers for support. Discussions are focused on 
converting DMPs from a stick to a carrot. Researchers and other stakeholders 
must come to regard them as a benefit: something useful for doing their 
research, a manifest of their methods and outputs that can be used for 
reporting, evaluation and implementation, rather than an annoying 
administrative burden. 


This paper reviews the work underway by different groups to gather user 
requirements and trial solutions. It notes several international fora where 
discussions are taking place and lists DMP platforms in active development. 
We offer a summary of where things are going, who needs to be involved and 
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and turn machine-actionable DMPs into reality. 
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Objective: To ensure that resources designed to teach skills and best practices 
for scientific research data sharing and management are useful, the 
maintainers of those materials need to evaluate and update them to ensure 
their accuracy, currency, and quality. This paper advances the use and process 
of outside peer review for community resources in addressing ongoing 
accuracy, quality, and currency issues. It further describes the next step of 
moving the updated materials to an online collaborative community platform 
for future iterative review in order to build upon mechanisms for open 
science, ongoing iteration, participation, and transparent community 
engagement. 


Setting: Research data management resources were developed in support of 
the DataONE (Data Observation Network for Earth) project, which has 
deployed a sustainable, long-term network to ensure the preservation and 
access to multi-scale, multi-discipline, and multi-national environmental and 
biological science data (Michener et al. 2012). Created by members of the 
Community Engagement and Education (CEE) Working Group in 2011-2012, 
the freely available Educational Modules included three complementary 
components (slides, handouts, and exercises) that were designed to be 
adaptable for use in classrooms as well as for research data management 
training. 


Methods: Because the modules were initially created and launched in 2011- 
2012, the current members of the (renamed) Community Engagement and 
Outreach (CEO) Working Group were concerned that the materials could be 
and / or quickly become outdated and should be reviewed for accuracy, 
currency, and quality. In November 2015, the Working Group developed an 
evaluation rubric for use by outside reviewers. Review criteria were 
developed based on surveys and usage scenarios from previous DataONE 
projects. Peer reviewers were selected from the DataONE community 
network for their expertise in the areas covered by one of the 11 educational 
modules. Reviewers were contacted in March 2016, and were asked to 
volunteer to complete their evaluations online within one month of the 
request, by using a customized Google form. 


Results: For the 11 modules, 22 completed reviews were received by April 
2016 from outside experts. Comments on all three components of each 
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postdoctoral fellow attached to the CEO Working Group. These reviews 
contributed to the full evaluation and revision by members of the Working 
Group of all educational modules in September 2016. This review process, as 
well as the potential lack of funding for ongoing maintenance by Working 
Group members or paid staff, provoked the group to transform the modules to 
a more stable, non-proprietary format, and move them to an online open 
repository hosting platform, GitHub. These decisions were made to foster 
sustainability, community engagement, version control, and transparency. 


Conclusion: Outside peer review of the modules by experts in the field was 
beneficial for highlighting areas of weakness or overlap in the education 
modules. The modules were initially created in 2011-2012 by an earlier 
iteration of the Working Group, and updates were needed due to the constant 
evolving practices in the field. Because the review process was lengthy 
(approximately one year) comparative to the rate of innovations in data 
management practices, the Working Group discussed other options that would 
allow community members to make updates available more quickly. The 
intent of migrating the modules to an online collaborative platform (GitHub) 
is to allow for iterative updates and ongoing outside review, and to provide 
further transparency about accuracy, currency, and quality in the spirit of 
open science and collaboration. Documentation about this project may be 
useful for others trying to develop and maintain educational resources for 
engagement and outreach, particularly in communities and spaces where 
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Reproducibility and reusability of research results is an important concern in 
scientific communication and science policy. A foundational element of 
reproducibility and reusability is the open and persistently available 
presentation of research data. However, many common approaches for 
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machine actionability of cited data. The main target audience for the common 
implementation guidelines in this article consists of publishers, scholarly 
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from these recommendations. The guidance provided here is intended to help 
achieve widespread, uniform human and machine accessibility of deposited 
data, in support of significantly improved verification, validation, 
reproducibility and re-use of scholarly/scientific data. 
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Data, unlike some wines, do not improve with age. The contrary view, that 
data are immortal, a view that may underlie the often-observed tendency to 
recycle old examples in texts and presentations, is illustrated with three 
classical examples and rebutted by further examination. Some general lessons 
for data science are noted, as well as some history of statistical worries about 
the effect of data selection on induction and related themes in recent histories 
of science. 
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Data citations have become widely accepted. Technical infrastructures as well 
as principles and recommendations for data citation are in place but best 
practices or guidelines for their implementation are not yet available. On the 
other hand, the scientific climate community requests early citations on 
evolving data for credit, e.g. for CMIP6 (Coupled Model Intercomparison 
Project Phase 6). The data citation concept for CMIP6 is presented. The main 
challenges lie in limited resources, a strict project timeline and the 
dependency on changes of the data dissemination infrastructure ESGF (Earth 
System Grid Federation) to meet the data citation requirements. Therefore a 
pragmatic, flexible and extendible approach for the CMIP6 data citation 
service was developed, consisting of a citation for the full evolving data 
superset and a data cart approach for citing the concrete used data subset. This 
two citation approach can be implemented according to the RDA 
recommendations for evolving data. Because of resource constraints and 
missing project policies, the implementation of the second part of the citation 
concept is postponed to CMIP7. 
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Scholarly researchers today are increasingly required to engage in a range of 
data management planning activities to comply with institutional policies, or 
as a precondition for publication or grant funding. The latter is especially true 
in the U.S. in light of the recent White House Office of Science and 
Technology Policy (OSTP) mandate aimed at maximizing the availability of 
all outputs—data as well as the publications that summarize them—resulting 
from federally-funded research projects. 
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Champaign, and University of Virginia Library—collaborated on the 
development of the DMPTool, an online application that helps researchers 
create data management plans. The DMPTool provides detailed guidance, 
links to general and institutional resources, and walks a researcher through the 
process of generating a comprehensive plan tailored to specific DMP 
requirements. The uptake of the DMPTool has been positive: to date, it has 
been used by over 6,000 researchers from 800 institutions, making use of 
more than 20 requirements templates customized for funding bodies. 


With support from the Alfred P. Sloan Foundation, project partners are now 
engaged in enhancing the features of the DMPTool. The second version of the 
tool has enhanced functionality for plan creators and institutional 
administrators, as well as a redesigned user interface and an open RESTful 
application programming interface (API). 


New administrative functions provide the means for institutions to better 
support local research activities. New capabilities include support for plan co- 
ownership; workflow provisions for internal plan review; simplified 
maintenance and addition of DMP requirements templates; extensive 
capabilities for the customization of guidance and resources by local 
institutional administrators; options for plan visibility; and UI refinements 
based on user feedback and focus group testing. The technical work 
undertaken for the DMPTool Version 2 has been accompanied by a new 
governance structure and the growth of a community of engaged stakeholders 
who will form the basis for a sustainable path forward for the DMPTool as it 
continues to play an important role in research data management activities. 
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Data management is a timely and increasingly important topic for ecologists. 
Recent funder mandates requiring data management plans, combined with the 
data deluge that faces scientists, make education about data management 
critical for any future ecologist. In this study, we surveyed instructors of 
general ecology courses at 48 major institutions in the United States. We 
chose instructors at institutions that are likely to train future ecologists, and 
therefore, are most likely to influence the trajectory of data management 
education in this field. The survey queried instructors about institution and 
course characteristics, the extent to which data-related topics are included in 
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their courses, the barriers to their teaching these topics, and their own 
personal beliefs and values associated with data management and 
stewardship. We found that, in general, data management topics are not being 
covered in undergraduate ecology courses for a wide range of reasons. Most 
often, instructors cited a lack of time and a lack of resources as barriers to 
teaching data management. Although data are used for instruction at some 
point in the majority of the courses surveyed, good data management 
practices and a thorough understanding of the importance of data stewardship 
are not being taught. We offer potential explanations for this and suggestions 
for improvement. 
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Scientific datasets have immeasurable value, but they lose their value over 
time without proper documentation, long-term storage, and easy discovery 
and access. Across disciplines as diverse as astronomy, demography, 
archeology, and ecology, large numbers of small heterogeneous datasets (1.e., 
the long tail of data) are especially at risk unless they are properly 
documented, saved, and shared. One unifying factor for many of these at-risk 
datasets is that they reside in spreadsheets. In response to this need, the 
California Digital Library (CDL) partnered with Microsoft Research 
Connections and the Gordon and Betty Moore Foundation to create the 
DataUp data management tool for Microsoft Excel. Many researchers 
creating these small, heterogeneous datasets use Excel at some point in their 
data collection and analysis workflow, so we were interested in developing a 
data management tool that fits easily into those work flows and minimizes the 
learning curve for researchers. The DataUp project began in August 2011. We 
first formally assessed the needs of researchers by conducting surveys and 
interviews of our target research groups: earth, environmental, and ecological 
scientists. We found that, on average, researchers had very poor data 
management practices, were not aware of data centers or metadata standards, 
and did not understand the benefits of data management or sharing. Based on 
our survey results, we composed a list of desirable components and 
requirements and solicited feedback from the community to prioritize 
potential features of the DataUp tool. These requirements were then relayed 
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Data curation is attracting a growing interest in the library and information 
science community. The main purpose of data curation is to support data 
reuse. This paper discusses the issues of reusing quantitative social science 
data from three perspectives of searching and browsing for datasets, 
evaluating the reusability of datasets (including evaluating topical relevance, 
utility and data quality), and integrating datasets, by comparing dataset 
searching with online database searching. The paper also discusses using 
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integrating datasets. 
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This paper describes how the Finnish Ministry of Education and Culture 
launched an initiative on research data management and open data, open 
access publishing, and open and collaborative ways of working in 2014. Most 
of the universities and research institutions took part in the collaborative 
initiative building new tools and training material for the Finnish research 
needs. Measures taken by one university, Aalto University, are described in 
detail and analysed, and compared with the activities taking place in other 
universities. 
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services are described, and their benefits and drawbacks are discussed. 
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In many domains the rapid generation of large amounts of data is 
fundamentally changing how research is done. The deluge of data presents 
great opportunities, but also many challenges in managing, analyzing and 
sharing data. However, good training resources for researchers looking to 
develop skills that will enable them to be more effective and productive 
researchers are scarce and there is little space in the existing curriculum for 
courses or additional lectures. To address this need we have developed an 
introductory two-day intensive workshop, Data Carpentry, designed to teach 
basic concepts, skills, and tools for working more effectively and 
reproducibly with data. 


These workshops are based on Software Carpentry: two-day, hands-on, 
bootcamp style workshops teaching best practices in software development, 
that have demonstrated the success of short workshops to teach foundational 
research skills. Data Carpentry focuses on data literacy in particular, with the 
objective of teaching skills to researchers to enable them to retrieve, view, 
manipulate, analyze and store their and other's data in an open and 
reproducible way in order to extract knowledge from data. 
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Background 


Scientific research in the 21st century is more data intensive and collaborative 
than in the past. It is important to study the data practices of researchers— 
data accessibility, discovery, re-use, preservation and, particularly, data 
sharing. Data sharing is a valuable part of the scientific method allowing for 
verification of results and extending research from prior results. 


Methodology/Principal Findings 


A total of 1329 scientists participated in this survey exploring current data 
sharing practices and perceptions of the barriers and enablers of data sharing. 
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Scientists do not make their data electronically available to others for various 
reasons, including insufficient time and lack of funding. Most respondents are 
satisfied with their current processes for the initial and short-term parts of the 
data or research lifecycle (collecting their research data; searching for, 
describing or cataloging, analyzing, and short-term storage of their data) but 
are not satisfied with long-term data preservation. Many organizations do not 
provide support to their researchers for data management both in the short- 
and long-term. If certain conditions are met (such as formal citation and 
sharing reprints) respondents agree they are willing to share their data. There 
are also significant differences and approaches in data management practices 
based on primary funding agency, subject discipline, age, work focus, and 
world region. 


Conclusions/Significance 


Barriers to effective data sharing and preservation are deeply rooted in the 
practices and culture of the research process as well as the researchers 
themselves. New mandates for data management plans from NSF and other 
federal agencies and world-wide attention to the need to share and preserve 
data could lead to changes. Large scale programs, such as the NSF-sponsored 
DataNET (including projects like DataONE) will both bring attention and 
resources to the issue and make it easier for scientists to apply sound data 
management principles. 


Tenopir, Carol, Suzie Allard, Priyanki Sinha, Danielle Pollock, Jess Newman, 
Elizabeth Dalton, Mike Frame, Lynn Baird. "Data Management Education from 
the Perspective of Science Educators." International Journal of Digital Curation 
11, no. 1 (2016): 232-251. https://doi.org/10.2218/ijdc.v1 111.389 


In order to better understand the current state of data management education 
in multiple fields of science, this study surveyed scientists, including 
information scientists, about their data management education practices, 
including at what levels they are teaching data management, which topics 
they covering, and what barriers they experience in teaching these topics. We 
found that a handful of scientists are teaching data management in 
undergraduate, graduate, and other types of courses, as well as outside of 
classroom settings. Commonly taught data management topics included 
quality control, protecting data, and management planning. However, few 
instructors felt they were covering data management topics thoroughly, and 
respondents cited barriers such as lack of time, lack of necessary expertise, 
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and lack of information for teaching data management. We offer some 
potential explanations for the existing state of data management education 
and suggest areas for further research. 


Tenopir, Carol, Ben Birch, and Suzie Allard. Academic Libraries and Research 
Data Services: Current Practices and Plans for the Future. Chicago: Association 
of College and Research Libraries, 2012. http://www.worldcat.org/oclc/82 1933602 


Tenopir, Carol, Dane Hughes, Suzie Allard, Mike Frame, Ben Birch, Lynn Baird, 
Robert Sandusky, Madison Langseth, and Andrew Lundeen. "Research Data 
Services in Academic Libraries: Data Intensive Roles for the Future?" Journal of 
eScience Librarianship 4, no. 2 (2015): e1085. 
https://doi.org/10.7191/jeslib.2015.1085 


Tenopir, Carol, Robert J. Sandusky, Suzie Allard, and Ben Birch. "Academic 
Librarians and Research Data Services: Preparation and Attitudes." JFLA Journal 
39, no. 1 (2013): 70-78. https://doi.org/10.1177/0340035212473089 





. "Research Data Management Services in Academic Research Libraries 
and Perceptions of Librarians." Library & Information Science Research 36, no. 2 
(2014): 84-90. https://doi.org/10.1016/).lisr.2013.11.003 


Tenopir, Carol, Sanna Talja, Wolfram Horstmann, Elina Late, Dane Hughes, 
Danielle Pollock, Birgit Schmidt, Lynn Baird, Robert Sandusky, and Suzie Allard. 
"Research Data Services in European Academic Research Libraries." LIBER 
Quarterly 27, no. 1 (2017): 23-44. http://doi.org/10.18352/lq.10180 


Research data is an essential part of the scholarly record, and management of 
research data is increasingly seen as an important role for academic libraries. 
This article presents the results of a survey of directors of the Association of 
European Research Libraries (LIBER) academic member libraries to discover 
what types of research data services (RDS) are being offered by European 
academic research libraries and what services are planned for the future. 
Overall, the survey found that library directors strongly agree on the 
importance of RDS. As was found in earlier studies of academic libraries in 
North America, more European libraries are currently offering or are planning 
to offer consultative or reference RDS than technical or hands-on RDS. The 
majority of libraries provide support for training in skills related to RDS for 
their staff members. Almost all libraries collaborate with other organizations 
inside their institutions or with outside institutions in order to offer or develop 
policy related to RDS. We discuss the implications of the current state of 
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RDS in European academic research libraries, and offer directions for future 
research. 


Teperek, Marta , Maria J. Cruz, Ellen Verbakel, Jasmin Bohmer, and Alastair 
Dunning. "Data Stewardship Addressing Disciplinary Data Management Needs." 
International Journal of Digital Curation 13, no. 1 (2018): 141-149. 
http://www.ijdc.net/article/view/604 


One of the biggest challenges for multidisciplinary research institutions which 
provide data management support to researchers is addressing disciplinary 
differences (Akers and Doty, 2013). Centralised services need to be general 
enough to cater for all the different flavours of research conducted in an 
institution. At the same time, focusing on the common denominator means 
that subject-specific differences and needs may not be effectively addressed. 
In 2017, Delft University of Technology (TU Delft) embarked on an 
ambitious Data Stewardship project, aiming to comprehensively address data 
management needs across a multi-disciplinary campus. In this article we 
describe the principles behind the Data Stewardship project at TU Delft, the 
progress so far, identify the key challenges and explain our plans for the 
future. 


Teplitzky, Samantha. "Open Data, [Open] Access: Linking Data Sharing and 
Article Sharing in the Earth Sciences." Journal of Librarianship and Scholarly 
Communication 5, no. 1 (2017): eP2150. http://doi.org/10.7710/2 162-3309.2150 


INTRODUCTION The norms of a research community influence practice, 
and norms of openness and sharing can be shaped to encourage researchers 
who share in one aspect of their research cycle to share in another. Different 
sets of mandates have evolved to require that research data be made public, 
but not necessarily articles resulting from that collected data. In this paper, I 
ask to what extent publications in the Earth Sciences are more likely to be 
open access (in all of its definitions) when researchers open their data through 
the Pangaea repository. METHODS Citations from Pangaea data sets were 
studied to determine the level of open access for each article. RESULTS This 
study finds that the proportion of gold open access articles linked to the 
repository increased 25% from 2010 to 2015 and 75% of articles were 
available from multiple open sources. DISCUSSION The context for 
increased preference for gold open access is considered and future work 
linking researchers’ decisions to open their work to the adoption of open 
access mandates is proposed. 
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Terry, Robert F., Katherine Littler, and Piero L. Olliaro. "Sharing Health Research 
Data—the Role of Funders in Improving the Impact." F/000Research 7 (2018): 
1641 https://doi.org/10.12688/f1000research. 16523.2 





Recent public health emergencies with outbreaks of influenza, Ebola and Zika 
revealed that the mechanisms for sharing research data are neither being used, 
or adequate for the purpose, particularly where data needs to be shared 
rapidly. 


A review of research papers, including completed clinical trials related to 
priority pathogens, found only 31% (98 out of 319 published papers, 
excluding case studies) provided access to all the data underlying the paper— 
65% of these papers give no information on how to find or access the data. 
Only two clinical trials out of 58 on interventions for WHO priority 
pathogens provided any link in their registry entry to the background data. 


Interviews with researchers revealed a reluctance to share data included a lack 
of confidence in the utility of the data; an absence of academic-incentives for 
rapid dissemination that prevents subsequent publication and a disconnect 
between those who are collecting the data and those who wish to use it 
quickly. The role of the funders of research needs to change to address this. 
Funders need to engage early with the researchers and related stakeholders to 
understand their concerns and work harder to define the more explicitly the 
benefits to all stakeholders. Secondly, there needs to be a direct benefit to 
sharing data that is directly relevant to those people that collect and curate the 
data. Thirdly more work needs to be done to realise the intent of making data 
sharing resources more equitable, ethical and efficient. Finally, a checklist of 
the issues that need to be addressed when designing new or revising existing 
data sharing resources should be created. This checklist would highlight the 
technical, cultural and ethical issues that need to be considered and point to 
examples of emerging good practice that can be used to address them. 


Thanos, Costantino, "Research Data Reusability: Conceptual Foundations, Barriers 
and Enabling Technologies." Publications 5, no. 2 (2017): 2. 
https://doi.org/10.3390/publications5010002 


High-throughput scientific instruments are generating massive amounts of 
data. Today, one of the main challenges faced by researchers is to make the 
best use of the world's growing wealth of data. Data (re)usability is becoming 
a distinct characteristic of modern scientific practice. By data (re)usability, we 
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mean the ease of using data for legitimate scientific research by one or more 
communities of research (consumer communities) that is produced by other 
communities of research (producer communities). Data (re)usability allows 
the reanalysis of evidence, reproduction and verification of results, 
minimizing duplication of effort, and building on the work of others. It has 
four main dimensions: policy, legal, economic and technological. The paper 
addresses the technological dimension of data reusability. The conceptual 
foundations of data reuse as well as the barriers that hamper data reuse are 
presented and discussed. The data publication process is proposed as a bridge 
between the data author and user and the relevant technologies enabling this 
process are presented. 


Thanos, Costantino, Friederike Klan, Kyriakos Kritikos, and Leonardo Candela. 
"White Paper on Research Data Service Discoverability." Publications 5, no. 1 
(2017): 1. https://doi.org/10.3390/publications5010001 


This White Paper reports the outcome of a Workshop on "Research Data 
Service Discoverability" held in the island of Santorini (GR) on 21-22 April 
2016 and organized in the context of the EU funded Project "RDA-E3." The 
Workshop addressed the main technical problems that hamper an efficient 
and effective discovery of Research Data Services (RDSs) based on 
appropriate semantic descriptions of their functional and non-functional 
aspects. In the context of this White Paper, by RDSs are meant those data 
services that manipulate/transform research datasets for the purpose of 
gaining insight into complicated issues. In this White Paper, the main 
concepts involved in the discovery process of RDSs are defined; the RDS 
discovery process is illustrated; the main technologies that enable the 
discovery of RDSs are described; and a number of recommendations are 
formulated for indicating future research directions and making an automatic 
RDS discovery feasible. 


Thelwall, Mike, and Kousha Kayvan. "Do Journal Data Sharing Mandates Work? 
Life Sciences Evidence from Dryad." Aslib Journal of Information Management 
69, no. | (2017): 36-45. https://doi.org/10.1108/AJIM-09-2016-0159 


Thessen, Anne E., and David J. Patterson. "Data Issues in the Life Sciences." 
ZooKeys 150 (2011): 15-51. http://dx.doi.org/10.3897/zookeys.150.1766 


Thielen, Joanna, and Amanda Nichols Hess. "Advancing Research Data 
Management in the Social Sciences: Implementing Instruction for Education 
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Graduate Students into a Doctoral Curriculum." Behavioral & Social Sciences 
Librarian 36, no. 1 (2017): 16-30. https://doi.org/10.1080/01639269.2017.1387739 


Thoegersen, Jennifer L. "Examination of Federal Data Management Plan 
Guidelines." Journal of eScience Librarianship 4, no. 1 (2015): e1072. 
https://doi.org/10.7191/jeslib.2015.1072 


Thompson, Kristi, and Guoying Liu. "Lives in Data: Prominent Data Librarians, 
Archivists and Educators Share Their Thoughts." International Journal of 
Librarianship 2, no. 1 (2017). https://doi.org/10.23974/jol.2017.vol2.1.35 


We asked several data librarians, archivists and educators who have had 
prominent and interesting careers if they would be willing to let us profile 
them and share some of their thoughts on the field. Six graciously agreed to 
be interviewed via email. Many of our respondents played key roles in 
developing data services and infrastructure in their respective countries, while 
others are involved in building the future of the field through education, 
advancing standards, and advocacy. 


Our virtual panel includes Tuomas J. Alatera, Finland; Ann Green and Jian 
Qin, United States; Guangjing Li, China; Wendy Watkins, Canada; and Lynn 
Woolfrey, South Africa. 


Thompson, Kristi, and Shenqin Yin. "The Development of Academic Data 
Services in Canada and China: Profiles of Data Services at Fudan University and 
the University of Windsor." International Journal of Librarianship 2, no. 1 (2017): 
73-78. https://doi.org/10.23974/ijol.2017.vol2.1.32 


Thomson, Sara Day. Preserving Transactional Data. Glasgow: Digital 
Preservation Coalition, 2016. http://dx.doi.org/10.7207/twr16-02 


Tomaszewski, Robert. "Citations to Chemical Databases in Scholarly Articles: To 
Cite or Not to Cite?" Journal of Documentation 75 no. 6, (2019): 1317-1332. 
https://doi.org/10.1108/JD-12-2018-0214 


Toups, Megan, and Michael Hughes. "When Data Curation Isn't: A Redefinition 


for Liberal Arts Universities." Journal of Library Administration 53, no. 4 (2013): 
223-233. https://doi.org/10.1080/01930826.2013.865386 
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Treloar, Andrew. "Design and Implementation of the Australian National Data 
Service." International Journal of Digital Curation 4, no. 1 (2009): 125-137. 
https://doi.org/10.2218/ijdc.v4i1.83 


This paper will describe the genesis and realisation of the Australian National 
Data Service (ANDS). It will commence by outlining the context within 
which ANDS was conceived, both in the international research and Australian 
research support domains. It will then describe the process that brought about 
the ANDS vision and the principles that informed the realisation of that 
vision. The paper will then outline each of the four ANDS programs 
(Developing Frameworks, Providing Utilities, Seeding the Commons, and 
Building Capabilities) while also discussing particular items of note about the 
approach ANDS is taking. The paper concludes by briefly examining related 
work in the UK and US. 





. "The Research Data Alliance: Globally Co-ordinated Action against 
Barriers to Data Publishing and Sharing." Learned Publishing 27, no. 5 (2014): 9- 
13. https://doi.org/10.1087/20140503 


Treloar, Andrew, David Groenewegen, and Cathrine Harboe-Ree. "The Data 
Curation Continuum: Managing Data Objects in Institutional Repositories." D-Lib 
Magazine 13, no. 9/10 (2007). https://doi.org/10.1045/september2007-treloar 


Treloar, Andrew, and Jens Klump. "Updating the Data Curation Continuum." 
International Journal Of Digital Curation 14 no. 1 (2019): 87-101. 
https://doi.org/10.2218/ijdc.v 1411.643 


The Data Curation Continuum was developed as a way of thinking about data 
repository infrastructure. Since its original development over a decade ago, a 
number of things have changed in the data infrastructure domain. This paper 
revisits the thinking behind the original data curation continuum and updates 
it to respond to changes in research objects, storage models, and the 
repository landscape in general. 


Treloar, Andrew, and Ross Wilkinson. "Access to Data for eResearch: Designing 
the Australian National Data Service Discovery Services." International Journal of 
Digital Curation 3, no. 2 (2008): 151-158. https://doi.org/10.2218/ijdc.v3i2.66 


Much work on data repositories has derived from effort on document 
repositories. It is our contention that people do not access research data for 
the same reasons that they access research publications. We argue that it is 
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valuable to understand information needs, both immediate and contextual, in 
establishing both what information should be collected, what metadata are 
captured, and what discovery services should be established. We report on the 
information needs that we have collected in our efforts in establishing the 
Australian National Data Service. These needs cover much more than data — 
there are needs for information about the data, their creators, a need for 
overviews, and further requirements to do with proof, collaboration, and 
innovation. We provide an analysis of those needs, and a set of conclusions 
that has led to some implementation decisions for ANDS. 


Treloar, Andrew, and Mingfang Wu. "Provenance in Support of ANDS' Four 
Transformations.” [International Journal of Digital Curation 11, no. 1 (2016): 183- 
194. https://doi.org/10.2218/ijdc.v11i1.416 


This article introduces the provenance activities that are being carried out at 
the Australia National Data Services (ANDS). Since its beginning, ANDS has 
been promoting four data transformations so that Australia's research data 
become more valuable and reusable by researchers. Among many other 
activities that enable the four transformations, ANDS has been encouraging 
ANDS partners to capture and describe rich context at the time when a data 
collection is created. In 2015, ANDS funded a number of external projects 
that had provenance components. In addition, ANDS is working on the 
interoperability between the schema that is used by the ANDS research data 
registration and discovery service—Research Data Australia (RDA)—and the 
W3C recommended provenance standard, Provenance Ontology (PROV-O), 
and investigating how to enrich the schema to access provenance information. 
The article concludes by discussing the lessons we learnt and our future 
planned activity. 


Trimble, Leanne, Cheryl Woods, Francine Berish, Daniel Jakubek, and Sarah 
Simpkin. "Collaborative Approaches to the Management of Geospatial Data 
Collections in Canadian Academic Libraries: A Historical Case Study." Journal of 
Map & Geography Libraries 11, no. 3 (2015): 330-358. 
https://doi.org/10.1080/15420353.2015.1043067 


Tsoi, Ah Chung, Jeff McDonell, Andrew Treloar, and Ian Atkinson. "Dataset 
Acquisition, Accessibility, Annotation, E-Research Technologies (DART) 
Project." International Journal on Digital Libraries 7, no. 1/2 (2007): 53-55. 
https://doi.org/10.1007/s00799-007-0019-4 
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Tuyl, Steve Van, and Gabrielle Michalek. "Assessing Research Data Management 
Practices of Faculty at Carnegie Mellon University." Journal of Librarianship and 
Scholarly Communication 3, no. 3 (2015): eP1258. http://doi.org/10.7710/2162- 
3309.1258 


INTRODUCTION Recent changes to requirements for research data 
management by federal granting agencies and by other funding institutions 
have resulted in the emergence of institutional support for these requirements. 
At CMU, we sought to formalize assessment of research data management 
practices of researchers at the institution by launching a faculty survey and 
conducting a number of interviews with researchers. METHODS We 
submitted a survey on research data management practices to a sample of 
faculty including questions about data production, documentation, 
management, and sharing practices. The survey was coupled with in-depth 
interviews with a subset of faculty. We also make estimates of the amount of 
research data produced by faculty. RESULTS Survey and interview results 
suggest moderate level of awareness of the regulatory environment around 
research data management. Results also present a clear picture of the types 
and quantities of data being produced at CMU and how these differ among 
research domains. Researchers identified a number of services that they 
would find valuable including assistance with data management planning and 
backup/storage services. We attempt to estimate the amount of data produced 
and shared by researchers at CMU. DISCUSSION Results suggest that 
researchers may need and are amenable to assistance with research data 
management. Our estimates of the amount of data produced and shared have 
implications for decisions about data storage and preservation. 
CONCLUSION Our survey and interview results have offered significant 
guidance for building a suite of services for our institution. 


Ulbricht, Damian, Kirsten Elger, Roland Bertelmann, and Jens Klump. 
"panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and 
Publication of File-Based Datasets for GFZ Data Services." [SPRS International 
Journal of Geo-Information 5, no. 3 (2016): 25. 
http://dx.doi.org/10.3390/ijg15030025 


Ure, Jenny, Tasneem Irshad, Janet Hanley, Angus Whyte, Claudia Pagliari, Hilary 
Pinnock, and Brian McKinstry. "Curating Complex, Dynamic and Distributed 
Data: Telehealth as a Laboratory for Strategy." International Journal of Digital 
Curation 6, no. 2 (2011): 128-145. https://doi.org/10.2218/ijdc.v6i2.207 
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Telehealth monitoring data is now being collected across large populations of 
patients with chronic diseases such as stroke, hypertension, COPD and 
dementia. These large, complex and heterogeneous datasets, including 
distributed sensor and mobile datasets, present real opportunities for 
knowledge discovery and re-use, however they also generate new challenges 
for curation. This paper uses qualitative research with stakeholders in two 
nationally-funded telehealth projects to outline the perceptions, practices and 
preferences of different stakeholders with regard to data curation. Telehealth 
provides a living laboratory for the very different challenges implicit in 
designing and managing data infrastructure for embedded and ubiquitous 
computing. Here, technical and human agents are distributed, and interaction 
and state change is a central component of design, rather than an inconvenient 
challenge to it. The authors argue that there are lessons to be learned from 
other domains where data infrastructure has been radically rethought to 
address these challenges. 


Valentino, Maura, and Michael Boock. "Data Management Services in Academic 
Libraries: A Case Study at Oregon State University." Practical Academic 
Librarianship: The International Journal of the SLA Academic Division 5, no. 2 
(2015): 77-91. https://journals.tdl.org/pal/index.php/pal/article/view/7001 


Libraries have been asked to provide many new services over the past several 
decades. This paper aims to show how data management services were 
incorporated into the services that Oregon State University provides to faculty 
and graduate students. The lessons learned are general and applicable to any 
research institute that needs to manage data or help others with managing 
data. 


This work is licensed under a Creative Commons Attribution 2.5 License, 
https://creativecommons.org/licenses/by/2.5/legalcode. 


Van de Sandt, Stephanie, Siinje Dallmeier-Tiessen, Artemis Lavasa, Vivien Petras. 
"The Definition of Reuse." Data Science Journal 18, no. 1 (2019): 22. 
http://doi.org/10.5334/dsj-2019-022 


The ability to reuse research data is now considered a key benefit for the 
wider research community. Researchers of all disciplines are confronted with 
the pressure to share their research data so that it can be reused. The demand 
for data use and reuse has implications on how we document, publish and 
share research in the first place, and, perhaps most importantly, it affects how 
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we measure the impact of research, which is commonly a measurement of its 
use and reuse. It is surprising that research communities, policy makers, etc. 
have not clearly defined what use and reuse is yet. 


We postulate that a clear definition of use and reuse is needed to establish 
better metrics for a comprehensive scholarly record of individuals, 
institutions, organizations, etc. Hence, this article presents a first definition of 
reuse of research data. Characteristics of reuse are identified by examining the 
etymology of the term and the analysis of the current discourse, leading to a 
range of reuse scenarios that show the complexity of today's research 
landscape, which has been moving towards a data-driven approach. The 
analysis underlines that there is no reason to distinguish use and reuse. We 
discuss what that means for possible new metrics that attempt to cover Open 
Science practices more comprehensively. We hope that the resulting 
definition will enable a better and more refined strategy for Open Science. 


Van den Eynden, Veerle, and Louise Corti. "Advancing Research Data Publishing 
Practices for the Social Sciences: From Archive Activity to Empowering 
Researchers." International Journal on Digital Libraries 18, no. 2 (2017): 113- 
121. https://doi.org/10.1007/s00799-016-0177-3 


Sharing and publishing social science research data have a long history in the 
UK, through long-standing agreements with government agencies for sharing 
survey data and the data policy, infrastructure, and data services supported by 
the Economic and Social Research Council. The UK Data Service and its 
predecessors developed data management, documentation, and publishing 
procedures and protocols that stand today as robust templates for data 
publishing. As the ESRC research data policy requires grant holders to submit 
their research data to the UK Data Service after a grant ends, setting standards 
and promoting them has been essential in raising the quality of the resulting 
research data being published. In the past, received data were all processed, 
documented, and published for reuse in-house. Recent investments have 
focused on guiding and training researchers in good data management 
practices and skills for creating shareable data, as well as a self-publishing 
repository system, ReShare. ReShare also receives data sets described in 
published data papers and achieves scientific quality assurance through peer 
review of submitted data sets before publication. Social science data are 
reused for research, to inform policy, in teaching and for methods learning. 
Over a 10 years period, responsive developments in system workflows, access 
control options, persistent identifiers, templates, and checks, together with 
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targeted guidance for researchers, have helped raise the standard of self- 
publishing social science data. Lessons learned and developments in shifting 
publishing social science data from an archivist responsibility to a researcher 
process are showcased, as inspiration for institutions setting up a data 
repository. 


Van Deventer, Martie, and Heila Pienaar. "Research Data Management in a 
Developing Country: A Personal Journey." /nternational Journal of Digital 
Curation 10, no. 2 (2015): 33-47. https://doi.org/10.2218/ijdc.v10i2.380 


This paper explores our own journey to get to grips with research data 
management (RDM). It also mentions the overlap between our own ‘journeys' 
and that of the country. We share the lessons that we learnt along the way— 
the most important lesson being that you can learn many wonderful and 
valuable RDM lessons from the international trend setters, but in the end you 
need to get your hands dirty and get the work done yourself. You must, within 
the set parameters, implement the RDM practice that is both appropriate and 
acceptable for and to your own set of researchers—who may be conducting 
research in a context that may be very dissimilar to that of international peers. 


Van Horik, René, and Dirk Roorda. "Migration to Intermediate XML for 
Electronic Data (MIXED): Repository of Durable File Format Conversions." 
International Journal of Digital Curation 6, no. 2 (2011): 245-252. 
https://doi.org/10.2218/ijdc.v6i2.200 


Van Loon, James E., Katherine G. Akers, Cole Hudson, and Alexandra Sarkozy. 
"Quality Evaluation of Data Management Plans at a Research University." [FLA 
Journal 43, no. 1 (2017): 98-104. https://doi.org/10.1177/03400352 16682041 


Van Tuyl, Steven, and Amanda L. Whitmire. "Water, Water, Everywhere: 
Defining and Assessing Data Sharing in Academia." PLOS ONE 11, no. 2 (2016): 
e0147942. https://doi.org/10.1371/journal.pone.0147942 


Sharing of research data has begun to gain traction in many areas of the 
sciences in the past few years because of changing expectations from the 
scientific community, funding agencies, and academic journals. National 
Science Foundation (NSF) requirements for a data management plan (DMP) 
went into effect in 2011, with the intent of facilitating the dissemination and 
sharing of research results. Many projects that were funded during 2011 and 
2012 should now have implemented the elements of the data management 
plans required for their grant proposals. In this paper we define 'data sharing’ 
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and present a protocol for assessing whether data have been shared and how 
effective the sharing was. We then evaluate the data sharing practices of 
researchers funded by the NSF at Oregon State University in two ways: by 
attempting to discover project-level research data using the associated DMP 
as a Starting point, and by examining data sharing associated with journal 
articles that acknowledge NSF support. Sharing at both the project level and 
the journal article level was not carried out in the majority of cases, and when 
sharing was accomplished, the shared data were often of questionable 
usability due to access, documentation, and formatting issues. We close the 
article by offering recommendations for how data producers, journal 
publishers, data repositories, and funding agencies can facilitate the process 
of sharing data in a meaningful way. 


Van Zeeland, Hilde, and Jacquelijn Ringersma. "The Development of a Research 
Data Policy at Wageningen University & Research: Best Practices as a 
Framework." LIBER Quarterly 27, no. 1 (2017): 153-170. 
https://doi.org/10.18352/lq.10215 


The current case study describes the development of a Research Data 
Management policy at Wageningen University & Research, the Netherlands. 
To develop this policy, an analysis was carried out of existing frameworks 
and principles on data management (such as the FAIR principles), as well as 
of the data management practices in the organisation. These practices were 
defined through interviews with research groups. Using criteria drawn from 
the existing frameworks and principles, certain research groups were 
identified as 'best-practices': cases where data management was meeting the 
most important data management criteria. These best-practices were then used 
to inform the RDM policy. This approach shows how engagement with 
researchers can not only provide insight into their data management practices 
and needs, but directly inform new policy guidelines. 


Vanden-Hehir, Sally , Helena Cousijn, and Hesham Attalla. "Research Data 
Management Practices: Synergies and Discords between Researchers and 
Institutions." International Journal of Digital Curation 13, no. 1 (2018): 73-90. 
http://www.ijdc.net/article/view/499 


The aim of this study was to explore the synergies and discords in attitudes 
towards research data management (RDM) drivers and barriers for both 
researchers and institutions. Previous work has studied RDM from a single 
perspective, but not compared researchers’ and institutions’ perspectives. We 
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carried out qualitative interviews with researchers as well as institutional 
representatives to identify drivers and barriers, and to explore synergies and 
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RDM needs. 
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Background 


There is wide agreement in the biomedical research community that research 
data sharing is a primary ingredient for ensuring that science is more 
transparent and reproducible. Publishers could play an important role in 
facilitating and enforcing data sharing; however, many journals have not yet 
implemented data sharing policies and the requirements vary widely across 
journals. This study set out to analyze the pervasiveness and quality of data 
sharing policies in the biomedical literature. 


Methods 


The online author's instructions and editorial policies for 318 biomedical 
journals were manually reviewed to analyze the journal's data sharing 
requirements and characteristics. The data sharing policies were ranked using 
a rubric to determine if data sharing was required, recommended, required 
only for omics data, or not addressed at all. The data sharing method and 
licensing recommendations were examined, as well any mention of 
reproducibility or similar concepts. The data was analyzed for patterns 
relating to publishing volume, Journal Impact Factor, and the publishing 
model (open access or subscription) of each journal. 


Results 


A total of 11.9% of journals analyzed explicitly stated that data sharing was 
required as a condition of publication. A total of 9.1% of journals required 
data sharing, but did not state that it would affect publication decisions. 
23.3% of journals had a statement encouraging authors to share their data but 
did not require it. A total of 9.1% of journals mentioned data sharing 
indirectly, and only 14.8% addressed protein, proteomic, and/or genomic data 
sharing. There was no mention of data sharing in 31.8% of journals. Impact 
factors were significantly higher for journals with the strongest data sharing 
policies compared to all other data sharing criteria. Open access journals were 
not more likely to require data sharing than subscription journals. 
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Our study confirmed earlier investigations which observed that only a 
minority of biomedical journals require data sharing, and a significant 
association between higher Impact Factors and journals with a data sharing 
requirement. Moreover, while 65.7% of the journals in our study that required 
data sharing addressed the concept of reproducibility, as with earlier 
investigations, we found that most data sharing policies did not provide 
specific guidance on the practices that ensure data is maximally available and 
reusable. 


Verbaan, E., and A.M. Cox. "Occupational Sub-Cultures, Jurisdictional Struggle 
and Third Space: Theorising Professional Service Responses to Research Data 
Management." The Journal of Academic Librarianship 40, no. 3-4 (2014): 211- 
219. http://dx.doi.org/10.1016/j.acalib.2014.02.008 


Verhaar, Peter, Fieke Schoots, Laurents Sesink, and Floor Frederiks. "Fostering 
Effective Data Management Practices at Leiden University." LIBER Quarterly 27, 
no. 1 (2017): 1—22. http://doi.org/10.18352/lq.10185 


At Leiden University, it is increasingly recognised that effective data 
management forms an integral component of responsible research. To 
actively promote the stewardship of all the research data that are produced at 
Leiden University, a comprehensive, institution-wide programme was 
launched in 2015, which centrally aims to encourage its researchers to 
carefully plan the temporal storage, long-term preservation and potential 
reuse of their data. This programme, which is managed centrally by the 
Department of Academic Affairs, and which receives important contributions 
from academic staff, from Leiden University Libraries, and from the 
University’s central ICT organisation, basically consists of three parts. Firstly, 
a basic central policy has been formulated, containing clear guidelines for 
activities before, during and after research projects. The central aim of this 
institutional policy is to ensure that all Leiden-based research projects can 
effectively comply with the most common requirements stipulated by funding 
agencies, academic publishers, the Dutch standard evaluation protocol and the 
European data protection directive. As a second part of the data management 
programme, faculties have organised workshops and meetings, concentrating 
on the rationale and on the technical and organisational practicalities of 
effective data management in order to bring about a discipline-specific 
protocol. Data librarians employed by Leiden University Libraries have 
developed educational materials and provide training for PhDs in the 
principles and benefits of good data management. Thirdly, to ensure that 
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scholars can genuinely make a reasoned selection among the many tools that 
are currently available, a central catalogue was developed which lists and 
characterises the most relevant data management services. The catalogue 
currently provides information about, amongst many other aspects, the 
organisations behind these services, the main academic disciplines which are 
targeted and the accepted file formats and metadata formats. The various 
aspects of these facilities have been classified using terminology provided by 
conceptual models developed by the UKDA, ANDS and the DCC. Using 
Leiden University’s policy guidelines as criteria, the overall suitability of 
each service has also been evaluated. Leiden University’s data management 
programme has a total duration of three years, and its basic objective is to 
offer a comprehensive form of support, in which the data management policy 
which is propagated centrally is complemented by various forms of assistance 
which ought to make it easier for scholars to adhere to this policy. The 
catalogue of data management services also aims to bolster the 
implementation of an adequate technical infrastructure, as the qualitative 
evaluations of the services enable policy-makers and developers to quickly 
establish gaps or other shortcomings within existing facilities. 
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In this paper we summarize the findings of an empirical study conducted by 
the EDaWaX Project. 141 economics journals were examined regarding the 
quality and extent of data availability policies that should support replications 
of published empirical results in economics. This paper suggests criteria for 
such policies that aim to facilitate replications. These criteria were also used 
for analysing the data availability policies we found in our sample and to 
identify best practices for data policies of scholarly journals in economics. In 
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addition, we also evaluated the journals' data archives and checked the 
percentage of articles associated with research data. To conclude, an appraisal 
as to how scientific libraries might support the linkage of publications to 
underlying research data in cooperation with researchers, editors, publishers 
and data centres is presented. 
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This paper summarizes the findings of an analysis of scientific infrastructure 
service providers (mainly from Germany but also from other European 
countries). These service providers are evaluated with regard to their potential 
services for the management of publication-related research data in the field 
of social sciences, especially economics. For this purpose we conducted both 
desk research and an online survey of 46 research data centres (RDCs), 
library networks and public archives; almost 48% responded to our survey. 
We find that almost three-quarters of all respondents generally store 
externally generated research data—which also applies to publication-related 
data. Almost 75% of all respondents also store and host the code of 
computation or the syntax of statistical analyses. If self-compiled software 
components are used to generate research outputs, only 40% of all 
respondents accept these software components for storing and hosting. Eight 
out of ten institutions also take specific action to ensure long-term data 
preservation. With regard to the documentation of stored and hosted research 
data, almost 70% of respondents claim to use the metadata schema of the 
Data Documentation Initiative (DDI); Dublin Core is used by 30 percent 
(multiple answers were permitted). Almost two-thirds also use persistent 
identifiers to facilitate citation of these datasets. Three in four also support 
researchers in creating metadata for their data. Application programming 
interfaces (APIs) for uploading or searching datasets currently are not yet 
implemented by any of the respondents. Least common is the use of semantic 
technologies like RDF. 


Concluding, the paper discusses the outcome of our survey in relation to 


Research Data Centres (RDCs) and the roles and responsibilities of 
publication-related data archives for journals in the fields of social sciences. 
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Vof, Viola, and Hamrin, Goran. "Quadcopters or Linguistic Corpora: Establishing 
RDM Services for Small-Scale Data Producers at Big Universities." LIBER 
Quarterly 28, no. 1 (2018): 1-58. http://doi.org/10.18352/lq.10255 


During an international library conference in 2017 the authors had many 
productive exchanges about similarities and differences in Swedish and 
German higher-education libraries. Since research data management (RDM) 
is an emerging topic on both sides of the Baltic Sea, we find it valuable to 
compare strategies, services, and workflows to learn from each other's 
practices. 


Aim: In this paper, we aim to compare the practices and needs of small-scale 
data producers in engineering and the humanities. In particular, we try to 
answer the following research questions: 


What kind of data do the small-scale data producers produce? 
What do these producers need in terms of RDM support? 
What then can we librarians help them with? 


Hypothesis: Our research hypothesis is that small-scale data producers have 
similar needs in engineering and the humanities. This hypothesis is based on 
the similarities in demands from funding agencies on (open) research data and 
on the assumption that research in different subjects often creates results 
which are different in content but similar in structure. 


Method: We study the current strategies, practices, and services of our 
respective universities (KTH Royal Institute of Technology Stockholm and 
Westfalische Wilhelms-Universitat Miinster). We also study the work and 
initiatives done on a more advanced level by universities, libraries, and other 
organisations in Sweden and Germany. 


Results: The paper will give an overview of how we did the groundwork for 
the initial services provided by our libraries. We focus on what we are doing 
and why we are doing it. We find that we are following in the leading 
footsteps of other university libraries. The experiences shared by colleagues 
help us to adapt their best practices to our local demands, making them better 
practices for KTH and WWU researchers. 
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Limitation: We restrict ourselves to studying only researchers who create data 
on a small scale, since the large-scale data producers handle the RDM on their 
own. 
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relationship between the data objects and map the archaeological 
documentation process. The metadata implicit in the record-keeping system is 
automatically extracted upon ingest, combined with additional sources of 
metadata, and stored alongside the data in the iRods preservation 
environment. This method enables a more organized workflow for the 
researchers, helps them archive their data close to the moment of data 
creation, and avoids error prone manual metadata input. We describe the 
types of metadata extracted and provide technical details of the extraction 
process and storage of the data and metadata. 
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Data sharing is a difficult process for both the data producer and the data 
reuser. Both parties are faced with more disincentives than incentives. Data 
producers need to sink time and resources into adding metadata for data to be 
findable and usable, and there is no promise of receiving credit for this effort. 
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Making data available also leaves data producers vulnerable to being scooped 
or data misuse. Data reusers also need to sink time and resources into 
evaluating data and trying to understand them, making collecting their own 
data a more attractive option. In spite of these difficulties, some data 
producers are looking for new ways to make data sharing and reuse a more 
viable option. This paper presents two cases from the surface and climate 
modeling communities, where researchers who produce data are reaching out 
to other researchers who would be interested in reusing the data. These cases 
are evaluated as a strategy to identify ways to overcome the challenges 
typically experienced by both data producers and data reusers. By working 
together with reusers, data producers are able to mitigate the disincentives and 
create incentives for sharing data. By working with data producers, data 
reusers are able to circumvent the hurdles that make data reuse so 
challenging. 
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The success of eScience research depends not only upon effective 
collaboration between scientists and technologists but also upon the active 
involvement of data archivists. Archivists rarely receive scientific data until 
findings are published, by which time important information about their 
origins, context, and provenance may be lost. Research reported here 
addresses the life cycle of data from collaborative ecological research with 
embedded networked sensing technologies. A better understanding of these 
processes will enable archivists to participate in earlier stages of the life cycle 
and to improve curation of these types of scientific data. Evidence from our 
interview study and field research yields a nine-stage life cycle. Among the 
findings are the cumulative effect of decisions made at each stage of the life 
cycle; the balance of decision-making between scientific and technology 
research partners; and the loss of certain types of data that may be essential to 
later interpretation. 
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Research on practices to share and reuse data will inform the design of 
infrastructure to support data collection, management, and discovery in the 
long tail of science and technology. These are research domains in which data 
tend to be local in character, minimally structured, and minimally 
documented. We report on a ten-year study of the Center for Embedded 
Network Sensing (CENS), a National Science Foundation Science and 
Technology Center. We found that CENS researchers are willing to share 
their data, but few are asked to do so, and in only a few domain areas do their 
funders or journals require them to deposit data. Few repositories exist to 
accept data in CENS research areas. Data sharing tends to occur only through 
interpersonal exchanges. CENS researchers obtain data from repositories, and 
occasionally from registries and individuals, to provide context, calibration, 
or other forms of background for their studies. Neither CENS researchers nor 
those who request access to CENS data appear to use external data for 
primary research questions or for replication of studies. CENS researchers are 
willing to share data if they receive credit and retain first rights to publish 
their results. Practices of releasing, sharing, and reusing of data in CENS 
reaffirm the gift culture of scholarship, in which goods are bartered between 
trusted colleagues rather than treated as commodities. 
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NASA's Earth Science Data and Information System (ESDIS) Project began 
investigating the use of Digital Object Identifiers (DOIs) in 2010 with the 
goal of assigning DOIs to various data products. These Earth science research 
data products produced using Earth observations and models are archived and 
distributed by twelve Distributed Active Archive Centers (DAACs) located 
across the United States. Each data center serves a different Earth science 
discipline user community and, accordingly, has a unique approach and 
process for generating and archiving a variety of data products. These varied 
approaches present a challenge for developing a DOI solution. To address this 
challenge, the ESDIS Project has developed processes, guidelines, and several 
models for creating and assigning DOIs. Initially the DOI assignment and 
registration process was started as a prototype but now it is fully operational. 


228 
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Since 2006 the education authorities in Switzerland have been obliged by the 
Constitution to harmonize important benchmarks in the educational system 
throughout Switzerland. With the development of national educational 
objectives in four disciplines an important basis for the implementation of this 
constitutional mandate was created. In 2013 the Swiss National Core Skills 
Assessment Program. . . was initiated to investigate the skills of students, 
starting with three of four domains: mathematics, language of teaching and 
first foreign language in grades 2, 6 and 9. UGK uses a computer-based test 
and a sample size of 25.000 students per year. 


A huge challenge for computer-based educational assessment is the research 
data management process. Data from several different systems and tools 
existing in different formats has to be merged to obtain data products 
researchers can utilize. The long term preservation has to be adapted as well. 
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INTRODUCTION As data-driven research becomes the norm, practical 
knowledge in data stewardship is critical for researchers. Despite its growing 
importance, formal education in research data management (RDM) is rare at 
the university level. Academic librarians are now playing a leadership role in 
developing and providing RDM training and support to faculty and graduate 
students. This case study describes the development and implementation of a 
new, credit-bearing course in RDM for graduate students from all disciplines. 
DESCRIPTION OF PROGRAM The purpose of the course was to enable 
students to acquire foundational knowledge and skills in RDM that would 
support long-term habits in the planning, management, preservation, and 
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sharing of research data. The pedagogical approach for the course combined 
outcomes centered course design with active learning techniques. Periodic 
course assessment was performed through anonymous student surveys, with 
the objective of gauging course efficacy and quality, and to obtain suggested 
modifications or improvements. These assessment results indicated that the 
course content and scope were appropriate and that the active learning 
approach was effective. Assessments of student learning demonstrated that all 
major learning objectives were achieved. NEXT STEPS Information derived 
from the student surveys was used to determine how the course could be 
modified to improve student experience and the overall quality of the course 
and the instruction. 
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We report on an exploratory study consisting of brief case studies in selected 
disciplines, examining what motivates researchers to work (or want to work) 
in an open manner with regard to their data, results and protocols, and 
whether advantages are delivered by working in this way. We review the 
policy background to open science, and literature on the benefits attributed to 
open data, considering how these relate to curation and to questions of who 
participates in science. The case studies investigate the perceived benefits to 
researchers, research institutions and funding bodies of utilizing open 
scientific methods, the disincentives and barriers, and the degree to which 
there is evidence to support these perceptions. Six case study groups were 
selected in astronomy, bioinformatics, chemistry, epidemiology, language 
technology and neuroimaging. The studies identify relevant examples and 
issues through qualitative analysis of interview transcripts. We provide a 
typology of degrees of open working across the research lifecycle, and 
conclude that better support for open working, through guidelines to assist 
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research groups in identifying the value and costs of working more openly, 
and further research to assess the risks, incentives and shifts in responsibility 
entailed by opening up the research process are needed. 
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Background 


The widespread reluctance to share published research data is often 
hypothesized to be due to the authors’ fear that reanalysis may expose errors 
in their work or may produce conclusions that contradict their own. However, 
these hypotheses have not previously been studied systematically. 


Methods and Findings 


We related the reluctance to share research data for reanalysis to 1148 
statistically significant results reported in 49 papers published in two major 
psychology journals. We found the reluctance to share data to be associated 
with weaker evidence (against the null hypothesis of no effect) and a higher 
prevalence of apparent errors in the reporting of statistical results. The 
unwillingness to share data was particularly clear when reporting errors had a 
bearing on statistical significance. 


Conclusions 


Our findings on the basis of psychological papers suggest that statistical 
results are particularly hard to verify when reanalysis is more likely to lead to 
contrasting conclusions. This highlights the importance of establishing 
mandatory data archiving policies. 


Wilcox, David, "Supporting FAIR Data Principles with Fedora." LIBER Quarterly, 
28, no. 1 (2018): 1-8. http://doi.org/10.18352/lq.10247 


Making data findable, accessible, interoperable, and re-usable is an important 
but challenging goal. From an infrastructure perspective, repository 
technologies play a key role in supporting FAIR data principles. Fedora is a 
flexible, extensible, open source repository platform for managing, 
preserving, and providing access to digital content. Fedora is used in a wide 
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variety of institutions including libraries, museums, archives, and government 
organizations. Fedora provides native linked data capabilities and a modular 
architecture based on well-documented APIs and ease of integration with 
existing applications. As both a project and a community, Fedora has been 
increasingly focused on research data management, making it well-suited to 
supporting FAIR data principles as a repository platform. Fedora provides 
strong support for persistent identifiers, both by minting HTTP URIs for each 
resource and by allowing any number of additional identifiers to be associated 
with resources as RDF properties. Fedora also supports rich metadata in any 
schema that can be indexed and disseminated using a variety of protocols and 
services. As a linked data server, Fedora allows resources to be semantically 
linked both within the repository and on the broader web. Along with these 
and other features supporting research data management, the Fedora 
community has been actively participating in related initiatives, most notably 
the Research Data Alliance. Fedora representatives participate in a number of 
interest and working groups focused on requirements and interoperability for 
research data repository platforms. This participation allows the Fedora 
project to both influence and be influenced by an international group of 
Research Data Alliance stakeholders. This paper will describe how Fedora 
supports FAIR data principles, both in terms of relevant features and 
community participation in related initiatives. 
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. "Assessing Research Data Deposits and Usage Statistics within IDEALS." 
Journal of eScience Librarianship 6, no. 2 (2017): e1112. 
https://doi.org/10.7191/jeslib.2017.1112 


Objectives: This study follows up on previous work that began examining 
data deposited in an institutional repository. The work here extends the earlier 
study by answering the following lines of research questions: (1) What is the 
file composition of datasets ingested into the University of Illinois at Urbana- 
Champaign (UIUC) campus repository? Are datasets more likely to be single- 
file or multiple-file items? (2) What is the usage data associated with these 
datasets? Which items are most popular? 
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Methods: The dataset records collected in this study were identified by 
filtering item types categorized as "data" or "dataset" using the advanced 
search function in IDEALS. Returned search results were collected in an 
Excel spreadsheet to include data such as the Handle identifier, date ingested, 
file formats, composition code, and the download count from the item's 
statistics report. The Handle identifier represents the dataset record's 
persistent identifier. Composition represents codes that categorize items as 
single or multiple file deposits. Date available represents the date the dataset 
record was published in the campus repository. Download statistics were 
collected via a website link for each dataset record and indicates the number 
of times the dataset record has been downloaded. Once the data was collected, 
it was used to evaluate datasets deposited into IDEALS. 


Results: A total of 522 datasets were identified for analysis covering the 
period between January 2007 and August 2016. This study revealed two 
influxes occurring during the period of 2008-2009 and in 2014. During the 
first timeframe a large number of PDFs were deposited by the Illinois 
Department of Agriculture. Whereas, Microsoft Excel files were deposited in 
2014 by the Rare Books and Manuscript Library. Single-file datasets clearly 
dominate the deposits in the campus repository. The total download count for 
all datasets was 139,663 and the average downloads per month per file across 
all datasets averaged 3.2. 


Conclusion: Academic librarians, repository managers, and research data 
services staff can use the results presented here to anticipate the nature of 
research data that may be deposited within institutional repositories. With 
increased awareness, content recruitment, and improvements, IRs can provide 
a viable cyberinfrastructure for researchers to deposit data, but much can be 
learned from the data already deposited. Awareness of trends can help 
librarians facilitate discussions with researchers about research data deposits 
as well as better tailor their services to address short-term and long-term 
research needs. 
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Scientists to Curate at Source." International Journal of Digital Curation 12, no. 2 
(2017): 1-25. https://doi.org/10.2218/ijdc.v1212.514 


Computers and computation have become essential to scientific activity and 
significant amounts of data are now captured digitally or even "born digital". 
Consequently, there is more and more incentive to capture the full experiment 
records using digital tools, such as Electronic Laboratory Notebooks (ELNs), 
to enable the effective linking and publication of experiment design and 
methods with the digital data that is generated as a result. Inclusion of 
metadata for experiment records helps with providing access, effective 
curation, improving search, and providing context, and further enables 
effective sharing, collaboration, and reuse. 


Regrettably, just providing researchers with the facility to add metadata to 
their experiment records does not mean that they will make use of it, or if 
they do, that the metadata they add will be relevant and useful. Our research 
has clearly indicated that researchers need support and tools to encourage 
them to create effective metadata. Tools, such as ELNs, provide an 
opportunity to encourage researchers to curate their records during their 
creation, but can also add extra value, by making use of the metadata that is 
generated to provide capabilities for research management and Open Science 
that extend far beyond what is possible with paper notebooks. 


The Southampton Chemical Information group, has, for over fifteen years, 
investigated the use of the Web and other tools for the collection, curation, 
dissemination, reuse, and exploitation of scientific data and information. As 
part of this activity we have developed a number of ELNs, but a primary 
concern has been how best to ensure that the future development of such tools 
is both usable and useful to researchers and their communities, with a focus 
on curation at source. In this paper, we describe a number of user research 
and user studies to help answer questions about how our community makes 
use of tools and how we can better facilitate the capture and curation of 
experiment records and the related resources. 
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Data Management at the University of Oxford." Ariadne, no. 65 (2010). 
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International Journal of Digital Curation 8, no. 2 (2013): 235-246. 
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Since presenting a paper at the International Digital Curation Conference 
2010 conference entitled 'An Institutional Approach to Developing Research 
Data Management Infrastructure’, the University of Oxford has come a long 
way in developing research data management (RDM) policy, tools and 
training to address the various phases of the research data lifecycle. Work has 
now begun on integrating these various elements into a unified infrastructure 
for the whole university, under the aegis of the Data Management Roll-out at 
Oxford (Damaro) Project. 


This paper will explain the process and motivation behind the project, and 
describes our vision for the future. It will also introduce the new tools and 
processes created by the university to tie the individual RDM components 
together. Chief among these is the 'DataFinder'—a hierarchically-structured 
metadata cataloguing system which will enable researchers to search for and 
locate research datasets hosted in a variety of different datastores from 
institutional repositories, through Web 2 services, to filing cabinets standing 
in department offices. DataFinder will be able to pull and associate research 
metadata from research information databases and data management plans, 
and is intended to be CERIF compatible. DataFinder is being designed so that 
it can be deployed at different levels within different contexts, with higher- 
level instances harvesting information from lower-level instances enabling, 
for example, an academic department to deploy one instance of DataFinder, 
which can then be harvested by another at an institutional level, which can 
then in turn be harvested by another at a national level. 


The paper will also consider the requirements of embedding tools and training 
within an institution and address the difficulties of ensuring the sustainability 
of an RDM infrastructure at a time when funding for such endeavours is 
limited. Our research shows that researchers (and indeed departments) are at 
present not exposed to the true costs of their (often suboptimal) data 
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management solutions, whereas when data management services are centrally 
provided the full costs are visible and off-putting. There is, therefore, the need 
to sell the benefits of centrally-provided infrastructure to researchers. 
Furthermore, there is a distinction between training and services that can be 
most effectively provided at the institutional level, and those which need to be 
provided at the divisional or departmental level in order to be relevant and 
applicable to researchers. This is being addressed in principle by Oxford's 
research data management policy, and in practice by the planning and piloting 
aspects of the Damaro Project. 
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There is worldwide interest in the potential of open science to increase the 
quality, impact, and benefits of science and research. More recently, attention 
has been focused on aspects such as transparency, quality, and provenance, 
particularly in regard to data. For industry, citizens, and other researchers to 
participate in the open science agenda, further work needs to be undertaken to 
establish trust in research environments. Based on a critical review of the 
literature, this paper examines the issue of trust in an open science 
environment, using virtual laboratories as the focus for discussion. A trust 
framework, which has been developed from an end-user perspective, is 
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proposed as a model for addressing relevant issues within online research data 
services and tools. 
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A substantial amount of data is collected through surveys conducted in Africa 
by national statistics offices, international donor organisations, research 
institutions, and the private sector. Data management at African national 
statistics offices is hampered by limited resources. An option for data curation 
in African countries is the establishment of dedicated institutions for data 
preservation and dissemination, such as survey data archives, and research 
data centres. DataFirst, at the University of Cape Town, has established an 
African data service and is helping to improve African data curation practices 
through providing data, promoting free curation tools, and undertaking data 
management training in African countries. 
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As data repositories make more data openly available it becomes challenging 
for researchers to find what they need either from a repository or through web 
search engines. This study attempts to investigate data users' requirements and 
the role that data repositories can play in supporting data discoverability by 
meeting those requirements. We collected 79 data discovery use cases (or 
data search scenarios), from which we derived nine functional requirements 
for data repositories through qualitative analysis. We then applied usability 
heuristic evaluation and expert review methods to identify best practices that 
data repositories can implement to meet each functional requirement. We 
propose the following ten recommendations for data repository operators to 
consider for improving data discoverability and user's data search experience: 


1. Provide a range of query interfaces to accommodate various data search 
behaviours. 


2. Provide multiple access points to find data. 


3. Make it easier for researchers to judge relevance, accessibility and 
reusability of a data collection from a search summary. 


4. Make individual metadata records readable and analysable. 

5. Enable sharing and downloading of bibliographic references. 

6. Expose data usage statistics. 

7. Strive for consistency with other repositories. 

8. Identify and aggregate metadata records that describe the same data object. 


9. Make metadata records easily indexed and searchable by major web search 
engines. 


10. Follow API search standards and community adopted vocabularies for 
interoperability. 
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Within information systems, a significant aspect of search and retrieval across 
information objects, such as datasets, journal articles, or images, relies on the 
identity construction of the objects. This paper uses identity to refer to the 
qualities or characteristics of an information object that make it definable and 
recognizable, and can be used to distinguish it from other objects. Identity, in 
this context, can be seen as the foundation from which citations, metadata and 
identifiers are constructed. 


In recent years the idea of including datasets within the scientific record has 
been gaining significant momentum, with publishers, granting agencies and 
libraries engaging with the challenge. However, the task has been fraught 
with questions of best practice for establishing this infrastructure, especially 
in regards to how citations, metadata and identifiers should be constructed. 
These questions suggests a problem with how dataset identities are formed, 
such that an engagement with the definition of datasets as conceptual objects 
is warranted. 


This paper explores some of the ways in which scientific data is an unruly and 
poorly bounded object, and goes on to propose that in order for datasets to 
fulfill the roles expected for them, the following identity functions are 
essential for scholarly publications: (i) the dataset is constructed as a 
semantically and logically concrete object, (11) the identity of the dataset is 
embedded, inherent and/or inseparable, (iii) the identity embodies a 
framework of authorship, rights and limitations, and (iv) the identity 
translates into an actionable mechanism for retrieval or reference. 
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Digital/data curation curricula have been around for a couple of decades. 
Currently, several ALA-accredited LIS programs offer digital/data curation 
courses and certificate programs to address the high demand for professionals 
with the knowledge and skills to handle digital content and research data in an 
ever-changing information environment. In this study, we aimed to examine 
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the topical scopes of digital/data curation curricula in the context of the LIS 
field. We collected 16 syllabi from the digital/data curation courses, as well as 
textual descriptions of the 11 programs and their core courses offered in the 
U.S., Canada, and the U.K. The collected data were analyzed using a 
probabilistic topic modeling technique, Latent Dirichlet Allocation, to 
identify both common and unique topics. The results are the identification of 
20 topics both at the program- and course-levels. Comparison between the 
program- and course-level topics uncovered a set of unique topics, and a 
number of common topics. Furthermore, we provide interactive visualizations 
for digital/data curation programs and courses for further analysis of topical 
distributions. We believe that our combined approach of a topic modeling and 
visualizations may provide insight for identifying emerging trends and co- 
occurrences of topics among digital/data curation curricula in the LIS field. 
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In the 21st century, digital data drive innovation and decision-making in 
nearly every field. However, little is known about the total size, 
characteristics, and sustainability of these data. In the scholarly sphere, it is 
widely suspected that there is a gap between the amount of valuable digital 
data that is produced and the amount that is effectively stewarded and made 
accessible. The Stewardship Gap Project (http://bit.ly/stewardshipgap) 
investigates characteristics of, and measures, the stewardship gap for 
sponsored scholarly activity in the United States. This paper presents a 
preliminary definition of the stewardship gap based on a review of relevant 
literature and investigates areas of the stewardship gap for which metrics have 
been developed and measurements made, and where work to measure the 
stewardship gap is yet to be done. The main findings presented are 1) there is 
not one stewardship gap but rather multiple "gaps" that contribute to whether 
data is responsibly stewarded; 2) there are relationships between the gaps that 
can be used to guide strategies for addressing the various stewardship gaps; 
and 3) there are imbalances in the types and depths of studies that have been 
conducted to measure the stewardship gap. 


Yu, Fei, Rebecca Deuble, and Helen Morgan. "Designing Research Data 
Management Services Based on the Research Lifecycle—A Consultative 
Leadership Approach." Journal of the Australian Library and Information 
Association 66, no. 3 (2017): 287-298. 
http://www.tandfonline.com/doi/abs/10.1080/24750158.2017.1364835 


Yu, Holly, H. "The Role of Academic Libraries in Research Data Service (RDS) 
Provision: Opportunities and Challenges." The Electronic Library 35, no. 4 (2017): 
783-797. https://doi.org/10.1108/EL-10-2016-0233 


Yu, Siu Hong. "Research Data Management: A Library Practitioner's Perspective." 


Public Services Quarterly 13, no. 1 (2017): 48-54. 
https://doi.org/10.1080/15228959.2016.1223475 


244 


Yu, Janice Chen Kung, and Sandy Campbell. "What Not to Keep: Not All Data 
Have Future Research Value." Journal of the Canadian Health Libraries 
Association 37, no. 2 (2016): 53-57. https://doi.org/10.5596/c16-013 


Zborowski, Mary. "Data Management Activities of Canada's National Science 
Library—2010 Update and Prospective." Data Science Journal 9 (2011): 100-106. 
https://doi.org/10.248 1/dsj.009-026 


NRC-CISTI serves Canada as its National Science Library (as mandated by 
Canada's Parliament in 1924) and also provides direct support to researchers 
of the National Research Council of Canada (NRC). By reason of its mandate, 
vision, and strategic positioning, NRC-CISTI has been rapidly and effectively 
mobilizing Canadian stakeholders and resources to become a lead player on 
both the Canadian national and international scenes in matters relating to the 
organization and management of scientific research data. In a previous 
communication (CODATA International Conference, 2008), the orientation 
of NRC-CISTI towards this objective and its short- and medium-term plans 
and strategies were presented. Since then, significant milestones have been 
achieved. This paper presents NRC-CISTI's most recent activities in these 
areas, which are progressing well alongside a strategic organizational 
redesign process that is realigning NRC-CISTI's structure, mission, and 
mandate to better serve its clients. Throughout this transformational phase, 
activities relating to data management remain vibrant. 
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Open research, data sharing and data re-use have become a priority for 
publicly- and charity-funded research. Efficient data management naturally 
requires computational resources that assist in data description, preservation 
and discovery. While it is possible to fund development of data management 
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systems, currently it is more difficult to sustain data resources beyond the 
original grants. That puts the safety of the data at risk and undermines the 
very purpose of data gathering. 


PlaSMo stands for 'Plant Systems-biology Modelling’ and the PlaSMo model 
repository was envisioned by the plant systems biology community in 2005 
with the initial funding lasting until 2010. We addressed the sustainability of 
the PlaSMo repository and assured preservation of these data by 
implementing an exit strategy. For our exit strategy we migrated data to an 
alternative, public repository with secured funding. We describe details of our 
decision process and aspects of the implementation. Our experience may 
serve as an example for other projects in a similar situation. 


We share our reflections on the sustainability of biological data management 
and the future outcomes of its funding. We expect it to be a useful input for 
funding bodies. 
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